40
D_RECT MARKET_NG ANALoT_CS JOURNAL An Annual Puylication from the Direct Marketing Association Analytics Council 2007 Letter From The Chair 1 Devyani Sadh, Ph.D., CEO Â Founder, Data Square, LLC Answering the “Optimal Modeling Sample Size” Conundrum: A Case Study from the Financial Sector 3 Krishna Mehts, Nitin Kumar Jain, and Martin Ahrens, Inductis Clustering Before Classification for Improved Response Rates ˝ Steve Gallant, Ph.D. and Robert Cooley, Ph.D., KXEN Using Control Groups to Target on Predicted Lift: Building and Assessing Uplift Models 14 Nicholas J. Radcliffe, Ph.D, Portrait Software and the University of Edinburgh Developing “Linkage Models” for Enhancing B-toB Acquisition Strategies 22 Gabrielle Bedewi and Paul Raca Cross-Selling Optimization: A Model-Based Framework and its Applications 28 Hongjie Wang, David King, and Sean Christy, Fulcrum Analytics Inc.

2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

D RECT MARKET NG ANAL T CS JOURNALAn Annual Pu lication from the Direct Marketing Association Analytics Council

2007

Letter From The Chair 1

Devyani Sadh, Ph.D., CEO Founder, Data Square, LLC

Answering the “Optimal Modeling Sample Size” Conundrum: A Case Study from the Financial Sector 3

Krishna Mehts, Nitin Kumar Jain, and Martin Ahrens, Inductis

Clustering Before Classification for Improved Response Rates

Steve Gallant, Ph.D. and Robert Cooley, Ph.D., KXEN

Using Control Groups to Target on Predicted Lift: Building and Assessing Uplift Models 14

Nicholas J. Radcliffe, Ph.D, Portrait Software and the University of Edinburgh

Developing “Linkage Models” for Enhancing B-toB Acquisition Strategies 22

Gabrielle Bedewi and Paul Raca

Cross-Selling Optimization: A Model-Based Framework and its Applications 28

Hongjie Wang, David King, and Sean Christy, Fulcrum Analytics Inc.

Page 2: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

CHA R Devyani Sadh, Ph.D.

CEO & Founder, Data Square, LLC

CE CHA R NTER-COUNC L L A SON Brad Rukstales

President, CAC Group, Inc.

MMED ATE PAST CHA R John Carter

Senior Vice President, Equifax

CONFERENCE CHA R WEBS TE CHA R Richard Deere

President, Direct Data Mining Consultants Inc.

JOURNAL CHA R Ziyong Cai

Director, CRM Modeling, Starwood Hotels & Resorts Worldwide

MEMBERSH P CHA R Leo Kluger

Senior Manager, Market Data & Analytics, IBM

MEMBERSH P COMM TTEE Todd King

Director, Analysis Services, Acxiom Digital

David Miller SVP, Global Segmentation, Claritas

MEMBERSH P COMM TTEE WEBS TE CHA R Helene Miller

Analytic Strategist, IBM

PROGRAMM NG SEM NARS CHA R Jacque Paige

Executive Recruiter, Smith Hanley Associates

PROGRAMM NG RTUAL SEM NARS CHA R Steve Briley

VP, Analytical Services, Merkle Direct Marketing, Inc.

NEWSLETTER CHA R John Young

SVP, Analytic Consulting Group, Epsilon

BLOG CHA R Parth Srinivasa

Director, Database Marketing, CDW Corporation

Tatiana RiosAssociate Event Producer

Diane Kaminskas Council Coordinator

Maria BenevengaNewsletter Designer

DMA STAFF

M SS ON STATEMENTThe mission of the DMA Analytics Council is to be the primary resource within the direct marketing community for leveraging information and analytics to drive direct marketing programs. The Analytics Council sponsors educational luncheons, seminars, idea exchange forums, and publi-cations for communicating state-of-the-art applications such as segmentation, predictive modeling, data mining, and primary Analytics with an emphasis on showing examples of how they can be used to create successful acquisition, loyalty, and cross-sell marketing programs.

DMA ANAL T CS COUNC L2006-2007 Operating Committee

Page 3: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

1

June, 2007

Dear Analytics Council Member,

I would like to begin by thanking our Journal Chair Ziyong Cai, Ph.D. of Starwoods Hotels and Resorts for all his hard work in putting together the 2007 Analytics Council Journal! This release includes a broad mix of articles ranging from acquisition to cross sell, and from sample si e determination to innovative applications of modeling techniques. I hope that you en oy these articles as much as I did.

Here s a brief preview

Krishna Mehta, Nitin Kumar Jain, and Martin Ahrens of Inductis use a case study approach from the financial sector to determine the “Optimal Modeling Sample Size.” In their article, they note that event rate of the target variable and numbers of predictors are the key in uencers.

Steve Gallant, Ph.D. and Robert Cooley, Ph.D. of K EN demonstrate how “Clustering Before Classification” can improve response rates, especially in the top deciles, with minimal increases in labor costs.

Nicholas J. Radcliffe, Ph.D. of Portrait Software and the niversity of Edinburgh compares various methodologies

for “Building and Assessing Uplift Models.” They introduce quality measures appropriate for assessing the performance of uplift models with both binary and continuous outcomes.

Gabrielle Bedewi, Ph.D. and Paul Raca illustrate a B to B acquisition application of the Order Structure Multinomial

Logistic Model for handling data with different coverage levels e.g., survey data and compiled list data via a twostage modeling framework.

• Hongjie Wang, Ph.D., David King, and Sean Christy of Fulcrum Analytics discuss Cross Selling Optimi ation techniques driven by forecasts of product category preferences and timing for next purchase based on a multi spell discrete survival modeling technique.

Let me also take this opportunity to update you on some exciting activities that the council has been involved in

I am absolutely delighted to announce the launch of the Analytics Council Blog, thanks to the efforts of our Blog Chair, Parth Srinivasa from CDW. The Analytics Blog is a great place for building community and for posting comments in an interactive format. Among other things, we are hoping to use the blog as a launching pad for our Tough Nuts program. Please take this opportunity to receive free consulting from some very highly regarded analytic thought leaders. All you need to do is post a question or invite comments on a sub ect area, and we can get some very interesting discussions going. Please visit us at http://www.the-dma.blogs.com/analytics/

We have revised our vision and mission statements to expand our target audience to include management involved in the utili ation of analytics in addition to trained analytics practitioners with advanced degrees. Please see http://www.the-dma.org/councils/analyticscouncil/ for details. To support this initiative, we are working on creating more exciting events that would promote networking for executives and middle management. We are actively

etter ro The h ir

Page 4: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

2

exploring the possibility of corporate sponsorships, and if you are interested in sponsoring one of our executive or analyst networking events, please contact me at

[email protected]

We are hoping that our next release will be a peer-reviewed analytics journal, thus making it more prestigious. If you are interested in being a reviewer, please contact Ziyong Cai, Ph.D at [email protected].

A fourth exciting event in the works is the 00 DMA Analytics Council Modeling Challenge. This is a great opportunity for Analytics Council member companies to test their predictive modeling methodologies against other participants. The winning company will be selected based on a set of pre determined ob ective criteria for predictive accuracy. Winning companies will have the opportunity to present their methodologies at a key DMA event to be confirmed . Please be sure to enroll in the program as soon as it is announced, as we are currently able to support only a limited number of entrants. Last year s challenge was an outstanding success with more than participating companies and packed attendance with standing room only at the DMA Annual Conference.

Last but not the least, I want to ensure that you are aware of our free Mentoring program and Analytics Council Certificate.

Please contact me, or visit our Web site for details, and if you haven t already enrolled in these programs, I would like to ask that you take advantage of these unique opportunities.

Finally, we need your help to spread the word and grow the Council. Please send me an email at [email protected] or call me at 03. . 33 Ext. 0 if you would like to share your ideas or time in enhancing the council. ■

Make it a great day!

Devyani Sadh, Ph.D. Chair, DMA Analytics Council CEO & Founder, Data Square, LLC

Join

N

N N N

t orw rAu ust 1, 2007New York, NY

i t DAu ust 2, 2007New York, NY

ew or o ro t o fere ceAu ust 7 – , 2007 New York, NY

Directo D e tAu ust 13, 2007 on each, A

D o fere ce hi itiocto er 13 – 1 , 2007hica o,

tio e ter for D t e r eti D

ecem er 10 – 12, 2007as e as, N

www the ma or conferences

Page 5: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

3

tr ct

Theoretical Statistical literature offers a lot of guidance on the selection of an appropriate sample si e. These principles are largely based on tradeoffs between Type I and Type II errors. We revisit this question in the context of choosing an appropriate modeling sample from a direct marketer s perspective. We therefore focus on the impact sample si e has on model lift performance and also on Type I and Type II errors.

In this case study, we address this question by varying the modeling sample si e across ten different direct marketing type datasets. The datasets vary across industries, event rates, number and type of variables, as well as overall si e. We find that the event rate of the target variable plays an important role in Type I and Type II errors. Further, from the lift performance perspective, optimal sample si e is heavily in uenced by the number of variables actually included in the modeling exercise. We also provide some guidance on sample selection for a modeling exercise.

tro ctio

In a world where there were not many financial, computational, or time constraints on working with data, there would be no need to worry about sampling, as we could effectively work with the largest available dataset. However, in the real world, these constraints make sampling a necessity, and smart sampling decisions need to be made.

Theoretical statistical literature offers a lot of guidance on the selection of an appropriate sample si e. These principles are

largely based on tradeoffs between Type I and Type II errors. We revisit this question about Type I and Type II errors in the context of choosing an appropriate modeling sample.

In this case study, we find that the event rate of the target variable plays a big role on the extent of Type I and Type II errors. We do not find conclusive results on impact of sample si e on Type and Type II errors.

From a direct marketing practitioner s standpoint, a model is only as good as the lift it provides, which brings up another very important question how does the choice of modeling sample si e affect lift

The question raised above can be addressed in two contexts. The first is when no model has been developed and an appropriate sampling strategy is being determined. A second context in which these questions are relevant occurs when the model has been developed and an effort is being made to determine how much additional lift is obtainable from the available data.

In the first context, emphasis is on avoiding pitfalls so that the best model is developed without having any first hand information on the lifts though some ballpark figures might be known from past experience . In this study, we provide some guidance on some things to look out for in these situations.

In the second context, lift from the chosen sample has been determined, and emphasis is on improving it further. For this stage, the article provides some guidelines that can help determine the likelihood of improved model performance through a re sampling.

Answering the “Optimal Modeling Sample Size” Conundrum: A Case Study

from the Financial SectorKrishna Mehta, Nitin Kumar Jain, and Martin Ahrens

Inductis

Page 6: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

4

We find that sample si e does in uence the lift performance of the model. However, there is no unique sample si e for all situations, and it depends on the number of variables present in the dataset.

For this study, we have taken great care in selecting datasets that represent a reasonable variation across the dimensions of interest. The datasets used in this case study are primarily from the financial sector, and vary across event rates, number and type of variables, as well as overall records. However, we acknowledge limitations due to the use of a limited number of master datasets. In generali ing the results presented here, the reader should remember that even though carefully structured, this is a case study.

t et

To understand the impact of sample si e on model performance and on Type I and Type II errors, we conducted the following exercise Ten master modeling datasets were chosen for the study. The datasets were selected to represent different application areas, as well as to include variability in sample si e and variables. The sub ect matter of the datasets included marketing response, customer acquisition, attrition, and default. The sample si es varied from a few thousand to over 00k. The number of variables in the datasets varied from

to over 00. Two validation datasets were also selected for each of the master datasets.

An appropriate binary response variable was created for each of the modeling datasets. For example, for a dataset from a marketing campaign, the dependent variable captured response to a direct mail promotion. A logistic regression model was developed for each of the datasets using two approaches a a derived variable approach where the raw variables were first transformed via principal component analysis, and b raw variables were directly used in the model. This was done to generate additional data points for the analysis.

Each of the modeling datasets was further broken up into sub datasets which represented %, 0%, and % of its observations. This was done to allow us to study the impact of sample si e and control for dataset specific peculiarities at the same time. Models were developed on the sub samples as well as the full dataset. For the study of impact of sample si e on lift performance, using the sub samples and alternate techniques, a total of 0 models were developed across 0 modeling datasets. Each model was

validated with the two validation datasets leading to 0 validation results.

For the study of impact of sample si e on Type I and Type II errors, principal component analysis was not performed and sub samples of ten master datasets were used to generate 0 models. Validation results from these models were used in the analysis.

e t

Type I and Type II Errors

It will be useful to begin this section with our definition of Type I and Type II errors. Type I error is defined as the ratio of cases which are predicted as targets but are actually nontargets, to all cases which are predicted as targets. Type II error is defined as the ratio of cases which are predicted as nontargets but actually belong to the target class, to all cases which are predicted as non targets. Note that this definition of Type I error is identical to that of false positive, and the definition of Type II error is identical to that of false negative.

To calculate Type I and Type II errors, we scored each model on a validation dataset and then designated all records above the cutoff as predicted targets, and those below the cutoff as predicted non targets. This methodology is standard practice in direct marketing exercises. Analysis was performed with cutoffs at top 0%, top 0%, and top 30% of the scored list. We observed that Type I Error went up, and Type II error went down as we relaxed the cutoff from 0% to 30%. However, the relationship of Type I and Type II error with sample si e and event rate was similar across the different cutoffs and results are reported here only for the cutoff at the twentieth percentile. Bivariate and multivariate methods were used to analy e the relationships, and the results are graphically represented below.

Figures 1 and 2 (see page 5) show the impact of the target event rates on Type I and Type II errors. We see that Type I error goes down as the target event rate increases. Further, Type II error goes up as the event rate increases. This result is very intuitive because a higher event rate implies that a given sample would have more of the target class available, and correspondingly, the number of non targets would be lower.

Therefore, fewer non targets are likely to make it to the target list as there are less of them to begin with. This will lower Type I error. For similar reasons, more targets are likely to make it to the non target list and increase Type II error.

Page 7: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

5

We also did some analysis on the impact of sample si e on Type I and Type II errors. We did not find any relationship that was statistically significant. This could be a manifest of the limited data points used in this case study.

Lift Performance

We begin our analysis of the results on lift performance by looking at the model performance generated across the sub samples. We measured model performance using lift generated at the twentieth percentile, as well as using KS Statistic. We found the results to be very similar for both the metrics, and report only the lift metric. We chose lift at the twentieth percentile, as it is a very common cutoff used in direct marketing exercises. Figure 1 (above) represents the average validation lifts across all the datasets which used %, 0%, %, and 00% of the populations. We looked at modeling lifts as well. The results followed a similar pattern and are not reported here.

As the results in Figure 3 (right) indicate, sample si e does have an impact on lift, as we see that the % sample does distinctly worse. Whereas the marginal impact of increase in sample si e goes down significantly, as we move to higher fraction sub samples.

While the result ( Figure 3, below) indicates that larger fractions of the dataset might be better to work with, one should keep in mind that these are average trends across all the datasets included in the study. A lot of questions remain unanswered What is the impact of actual si e of the dataset Does the number of variables available for modeling make a difference Do event rates or the level of lift have a role to play

The questions raised above can be best answered by a closer look at the lift performance of these datasets. To perform this analysis, we created five lift performance bands. To do that, we took the maximum lift that was obtained from each of the ten master datasets and then calculated its difference from the

i re : T e rror cro e t te i re : T e rror cro e t te

i re : ift ri tio cro Po tio e

Page 8: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

6

lift obtained from each of the samples estimated for that master dataset. We then divided these differences into five bands within 0. units of maximum lift, between 0. and . units of maximum lift, between

. and units of maximum lift, between and 0 units of maximum lift, and more then 0 units away from maximum lift. Band indicated the best scenario and records in bands and which had lift deviations of five or greater from the maximum were the worst cases.

Simple ANOVA was performed to identify key sample characteristics that varied across the five lift performance bands. A few of these characteristics are presented visually here.

Figure 4 (left) shows the lift performance across modeling sub samples. We observe that the average value of fractions declines over the bands. In other words, lower lifts are more often associated with smaller sample fractions. But at the same time, lower fraction datasets also show up a lot in the good lift performance bands. Therefore, there must be other factors that in uence this relationship.

One of the reasons that smaller fractions also show up in the good lift performance bands is that the si e varies across the datasets. One would expect the

% sample from a 00k dataset to have much better performance than a % sample from a 0k dataset. We therefore decided to redo the above analysis with the actual si e of the sub sample. Figure 5 (left) shows the actual sample si e across lift performance bands. Again while there is an overall decline in average value, small samples show up in the good bands, and big samples in the bad bands. Therefore, sample si e by itself cannot explain lift performance entirely.

The analysis to this point indicates that a multivariate approach might be needed to get a better handle on this problem. To do this we used a two tiered approach a we created some ratios to allow for the combined effect of the variables, and b finally, we used these ratios to identify multivariate relationships. We created ratios that involved sample si e, event rates, and number of variables. Another reason behind creation of these ratios was to help us come up with easy to implement guidelines for modeling sample selection. Of the ratios that were created, only one came out to be significant in our analysis

i re : r ctio er e t of o e i D t et ift Perfor ce

i re : e i e ift Perfor ce

i re : t i it tio ift Perfor ce

Page 9: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

7

Stability Ratio

This is sample si e divided by the number of variables present in the dataset. A low ratio indicates fewer observations for a given number of variables, and therefore a less stable dataset for the modeling exercise. The profiling performed with the stability ratio and other variables are used to derive rules that will help us answer questions about optimality of chosen sample for obtaining lift. (See Figure 6, page 6)

The first situation in which this analysis can be useful is before any modeling exercise has been performed, and we want to avoid the pitfalls in our sample selection strategy. We call this ex ante analysis. The second situation in which this is useful is after the model has been developed, and we have obtained the lift, and we are trying to determine whether the lift is adequate or whether we can do better. We call this ex post analysis.

Ex Ante Analysis Guidelines

This section provides some guidelines to help avoid the selection of inadequate samples at the time of the modeling exercise. We derived these results by developing a classification tree using sample si e and stability ratio. The tree target variable is the occurrence of model lift five points or greater below the best lift attained with the dataset. Only nodes that were heavily populated and extremely weighed towards the event or the non event were selected. This led to a

large fu y area where nothing could be inferred. This area was reduced using additional univariate profiling. The results are presented in the Figure .

In the figure, Very Safe implies that likelihood of lift being within five points of optimal is % or higher. Safe means that the likelihood of lift being within five points of optimal is between % and %. High Risk indicates that there is at least 0% chance of lift being more than five points away from optimal. Fu y is everything that is not covered by the above categories.

A very intuitive result is that very low sample si e or stability ratio can lead to unstable situations. Very large stability ratios lead to very low chance of missing optimal lift. An interesting finding is that modeling samples greater than ,000 or stability ratios greater than ,000 with sample si e being greater than ,000 lead to relatively safe models. (See Figure 7, below.)

There is a fuzzy area which is in-between safe and risky.

It should be noted that models built on samples smaller than ,000 observations, although they have high risks, are not necessarily bad. There are a lot of circumstances where very good models are built on much smaller samples. This is particularly true when the focus of the exercise is to explain relationship across variables. However, in the direct marketing

i re : Ex Ante i i e i e

Page 10: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

8

i re : Ex Post i i e i e

problem, predictive accuracy is the primary goal, and a model built on such a small sample is often though not always going to be somewhat short on predictive accuracy.

Ex Post Analysis Guidelines

This section lays out some guidelines that can be useful once a model has been developed. At this stage, a lot of time is spent on teasing out additional performance from the data. The guidelines in the chart below can be useful in determining the likely extent of lift possible from the dataset by re sampling. Please note that it assumes that modeling techniques have been otherwise optimally applied, and its focus is on adequacy of the sample in extracting the maximum lift possible from the entire modeling dataset. The techniques used are similar to those used for ex ante analysis guidelines. The label definitions and associated likelihoods are also similar.

We observe that very low stability ratios lead to high risk of deviation from the optimal lift. Furthermore, lifts higher than

0 are relatively safe, and those over are very safe, provided the stability ratio is not too low. High stability ratios make the model very safe. (See Figure 8, above.)

There is again a fu y area where no direct inference could be made. However, some additional inference is possible using the ex ante guidelines. The above results for both ex ante and

ex post analysis have been intentionally kept conservative to account for the limited number of datasets from which they are derived. However, udicious use of these principles, tempered with business context and good udgment, can prove to be beneficial. At the very least, practitioners should start looking at the stability ratio to udge the adequacy of their samples.

o c io

This case study addresses the issue of adequacy of selected sample si e for a direct marketing type targeted modeling exercise. While we found that higher event rates increase Type II error but reduce Type I error, we did not find any statistically robust relationship between these errors and sample si e. This could be a manifest of the limited sample si e used in the case study, and should be verified with a larger sample.

With respect to model lift, we find that there is no one appropriate sample si e, and it depends on the number of variables in the dataset. We then provide some guidelines that can help avoid ma or pitfalls during modeling exercise.

It should be noted that the results are based on a wide range but limited number of datasets included in the study, and it would be useful to further generali e them in future studies. ■

Page 11: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

9

tr ct

We investigate Clustering Before Classification CBC to improve response rates in the top decile of scores. The approach entails a preliminary clustering step where the data is split into five clusters using a targeted clustering algorithm, followed by a separate modeling step applied to each cluster. We find that CBC has the potential to improve performance in the top deciles of scores, but has little effect on lower deciles. By using analytical software that is highly automated, the additional effort for the extra modeling steps becomes reasonable for important modeling problems where the focus is upon performance in the top deciles of scores.

tro ctio

Building classification models is the most widely used modeling technique for marketing and customer relationship management, because such models provide answers to important questions

Who is likely to respond to a promotion

Which customers are likely to leave and go to competitors

Who is likely to become a high profit customer

Here we investigate a general method for improving accuracy of classification models by performing an initial clustering on the data, and then building a separate classification model for each cluster. Our goal is to develop a general way to potentially improve modeling accuracy without incurring greatly increased costs in human effort or computation time. In other

words, we seek a very practical way to increase the value of models we build.

We will first describe some important related concepts that bear on this Clustering Before Classification CBC approach next we will describe the approach in detail and finally, we will summari e some experiments to test CBC using actual marketing databases.

e te o ce t

CBC makes contact with five important notions in model building.

1. Two Stage Models

In Two Stage Models a stage model is built whose outputs either become inputs to a stage model, or are used to select a stage model. Multi stage models are nothing new in statistics. The Predictive Modeling Markup Language PMML specification refers to multi stage models as Model Composition . The use of the stage model

outputs as inputs into a second stage is referred to as Model Sequencing by the PMML specification, and the use of stage

outputs to select the appropriate stage model is referred to as Model Selection. Both decision trees and neural network models can be viewed as multi stage models. The advantage of such models is that they are higher order models that can fit the training data better by using a richer set of functions.

There are two potential disadvantages when using multi stage models. The first disadvantage is that multi stage models are more prone to overfit the data and to multiply the variance

Clustering Before Classification for Improved Response Rates

Steve Gallant, Ph.D. and Robert Cooley, Ph.D.

KXEN

Page 12: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

10

and errors of the models. Thus, we need to be careful to have a sufficiently large dataset before using such methods. The second disadvantage is that, depending on which methods are used, multi stage models can take much more time to properly train and deploy.

2. Higher-Order Features

Logistic and Linear Regression modeling are widely used for building marketing models, but such models contain terms involve only a single variable at a time. For example, a model to predict overweight people on the basis of height and weight will need to use higher order features, because weight, in the absence of height information, is not sufficient to determine whether an individual is overweight. In cases such as this where inputs must be combined and related to one another to be meaningful, modelers must explicitly add such compound terms or they will not be available to the model.

With respect to higher order features, one nice characteristic of clustering models is that clusters are typically defined by a collection of key features. Although cluster models can also be used for classification, this is seldom advisable, because clustering models almost never provide as strong classification models as algorithms that build classification models directly. Thus, a key motivation for our approach is to obtain the higher order features from clustering models, and use them in classification models to further improve classification performance yet without overfitting the data .

3. Targeted Clustering

The classic definition of clustering is that it is a completely unsupervised learning method . However, there is a notion of targeted clustering or supervised clustering where the distance metric used for determining the clusters is domain specific. In the context of response modeling, we can replace each input feature including strings by the response rate for a particular category or range of values. This defines the distance metric for calculating closeness in terms of response rates, instead of some generic distance metric, and thereby gives a principled way to normalize distances across all of the input features. The use of decision trees and algorithms such as CHAID 3 to segment a population based on a dependent variable is very similar in concept to targeting clustering. The difference is that targeted clustering makes use of true clustering algorithms that segment a population based on a multivariate distance metric.

4. Submodeling

A third theme touched upon by our approach is Submodeling

or Micro Modeling. For example, if we are interested in modeling response to a promotion, the region of the country may be so important that we must create different models for each region in order to have high predictive performance. Building such sub models can be especially important in domains such as risk modeling.

By using clustering to obtain a preliminary decomposition of the data into clusters, and then building separate models for each cluster, we are using the Submodeling approach to try for better fitting models on the homogenous clusters, rather than attempting to model the entire population with a single classification model. Berry and Linoff brie y refer to this as Piecewise Regression using Trees in , where decision trees are being used to segment the population instead of clustering.

5. Focus on the Top Quantiles (“Asymmetric Modeling”)

For many perhaps most classification modeling tasks, we are only interested in the top decile or two of customers after sorting by scores. For example, with a mail promotion, we will only be mailing to individuals with the highest model scores because they are most likely to buy. Therefore, we are very interested in the top decile or two of sorted scores, and not very interested in how well the model performs on the bottom deciles. For this reason, we call such problems Asymmetric Modeling problems.

CBC gives one easy way to perform Asymmetric Modeling in which we merely ignore clusters with low percentages of targets and model on the rest of the data. However, in our experience, this approach performs no better than using all clusters for modeling. So we limit our experiments to using all clusters.

o t tio etho

We now describe the computational approach we selected for our tests, discuss some important variations, and specify the details of the actual software implementation.

Algorithms

The overall algorithm is fairly straightforward

1 it the ata ran om into rainin 0 an est 20 sets

2 uster the rainin ata into ve c usters usin a tar ete c usterin a orithm

Page 13: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

11

3 A the usterin mo e to the rainin an est ata to ive each item a c uster num er

or each c uster in the rainin ata

a ui a oo c assi cation mo e that can a so out ut tar et ro a i ities

e ect est ata that is in this corres on in c uster, score it with ro a i ities usin the corres on in c assi cation mo e , an a en the score ata to the na score est out ut

om ute statistics on the na score est out ut

There are some important considerations to take into account when doing CBC. First, it is highly desirable to use a targeted clustering algorithm for the clustering. This will ensure that some of the clusters have high concentrations of the target, some have low, and some have about average. By contrast if untargeted clustering is used, then it is quite possible for each of the clusters to have about the same frequency of targets, which would limit the usefulness of the initial clustering step.

Second, it is important that the clustering model be capable of producing approximate probabilities, not ust scores. This is a necessity for comparing inputs that lie in different clusters, and which received scores from different classification models. Any monotonic function of probabilities that is consistent

across classification models will also work, for example log likelihood estimates.

Third, we have chosen to build classification models on each of the clusters. An alternative is to group the clusters and to build models on each group. A special case of this occurs with Asymmetric Modeling, where we might keep top clusters and build only a single model on ust the top clusters provided performance is as good as building individual

models on each cluster .

e e t tio Det i

We selected five clusters for our experiments as a trade off between getting enough segments to see an improvement over straightforward classification and having enough data in each cluster to get reliable second stage models. Experimenting with different numbers of clusters in the first stage is beyond the scope of this paper.

We selected The K EN Analytic Framework for all modeling tasks because it has all of the key features we need

Targeted Clustering: Targeted clustering algorithms specify one field as being most important. For our experiments, it is natural to make this field the target for our classification modeling. The modeling software we used has a nice way of normali ing clustering distances across diverse fields continuous with arbitrary scaling factors, Boolean,

character strings by initially recoding all inputs in terms of target probabilities. This gives a principled way to compare, for example, continuous fields and character string fields.

Automation: There are repetitive aspects to CBC involving similar operations for each cluster. It was very helpful that the modeling software automatically generates scripts so that processing can be run in batch mode, rather than having to use a G I repeatedly for each cluster and for each dataset.

Classification Produces Probability Scores: The modeling software can directly estimate probabilities using splits of the training data , and incorporate probability estimates into output scores.

Overfitting Prevention and Reporting: The modeling software we used automatically attempts to prevent overfitting when training models, but more importantly reports when it has been unsuccessful. Since one of the drawbacks of the CBC approach is that the training data available for an individual cluster may not be enough to train a reliable second stage classification model, this feature is very important for identifying the fast fail.

These features allowed us to perform clustering, classification submodeling, scoring, and statistics gathering with an estimated % increase in effort as compared to using straight classification modeling on the entire dataset. By the end of our experiments, for a new dataset we were building the cluster model, full classification model, five subclassification models, and gathering statistics, all in one to two hours.

D t et e t

Table 1 (page 12) summari es the datasets used for experimenting with the CBC modeling method.

In addition to the above datasets, we looked at two other datasets. One was a Retail dataset with 0 variables and

0,000 rows where K EN reported that even the simple Classification model was overfitting the data. Applying CBC worsened the overfitting by dividing the data even further. The second dataset was Banking data with 0,000 rows and

variables, where we built models for four different targets.

Page 14: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

12

Again, the individual models built on clusters were signaled to be unreliable, although three of the four models had CBC doing considerably better than straight Classification, and the fourth model significantly worse.

Table 2 (above) gives comparisons between straight Classification and CBC.

This table hows the results for the five datasets that provided reliable results. We show the following measurements for each of straight Classification and CBC

Area Under the Curve or AUC, a standard measure of fit over the entire test dataset.

Percentage of all targets in the top 5%, 10%, 20%, and 30%, after sorting by predicted probability of being a target.

i

CBC, as expected, performs best as a means of improving lift at the top % and 0% of the output lists. Looking at the top

% results, in three out of five cases, the fraction of targets improved by 0% or more, while results were similar in two cases.

For the top scoring 0% of the cases, results improved by at least 0% in one out of five cases, with similar results in other cases.

By the time we get to the top 0% of the scores, all cases differ by less than 0%.

Looking at the datasets, it s worth noting that the two datasets that did not provide reliable results for testing the CBC method

T e

T e

Page 15: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

13

each had fewer than 00,000 rows for training examples. This suggests that larger training datasets than might normally be acceptable for straight classification might be necessary for CBC modeling. This is especially true when the proportion of target variables is small, because the number of targets in individual clusters can become too small for reliable modeling. Extra care and checking is required in such cases.

r o c io

We conclude that Clustering Before Classification has the potential to improve results in the top quantiles, but not in all cases. Thus, it is advisable to try CBC in addition to straight Classification, rather than simply relying on CBC when the focus is on the top decile.

Trying both Classification and CBC approaches requires sufficient automation to make this much modeling practical, but we found that the K EN Analytic Framework nicely solves this problem through its built in modeling, data and statistics functionalities, plus the ability to automatically generate and run scripts. Each of the above datasets required a clustering model, a baseline Classification model, and five more classification models for CBC, so we ended up building

0 0 total models, but taking a total of only 0 to 30 hours to produce.

Thus, by taking advantage of automation, it becomes practical to build models using both Classification and CBC when performance in the top decile of scorers is important. ■

efere ce

Data Mining Group DMG Predictive Modeling Markup Language PMML , http www.dmg.org

M. Hornik, E. Marcade, S. Venkayala, Java Data Mining: Strategy, Standard, and Practice, Morgan Kaufmann, 00

3 G. Kass, An Exploratory Technique for Investigating Large Quantities of Categorical Data, Applied Statistics v. , 0

M. Berry, G. Linoff, Data Mining Techniques for Marketing Sales, and Customer Relationship Management, Wiley, 00

Knowledge Extraction Engines K EN , http www.kxen.com

Page 16: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

14

tr ct

Various authors have independently proposed modeling the difference between the behavior of a treated and a control population, and have used this as the basis for targeting direct marketing activity. We call such models plift Models. This paper reviews the motivation for such an approach, and compares the various methodologies put forward. We present results from using uplift modeling in three real world examples. We also introduce quality measures appropriate to assessing the performance of uplift models, for both binary outcomes purchase, attrition, click, default and continuous outcomes spend, response, si e or value lost . Finally, we discuss some of the challenges faced when building uplift models and suggest some key challenges for future research.

tro ctio

It is standard practice to employ control groups to allow post campaign assessment of the incrementality or lift of marketing actions. There is also limited but growing recognition that campaigns should ideally be targeted so as to maximi e predicted lift. This approach has been suggested, apparently independently, in at least five published papers. ,

, 3, , Various modeling methods have been suggested in the papers, including paired regression models and decision trees with modified build algorithms. All of these approaches recogni e that traditional response models actually predict either a conditional probability, such as

P (purchase | treatment), (1)

where P A B denotes the probability of A given B or

sometimes, as in the case of attrition modeling, one such as

P (attrition | no treatment). (2)

Clearly, the traditional approach does not model true response i.e., the change in behavior resulting from the action .

As a result, all the papers referenced suggest methods for predicting uplift, which we define, for demand generation applications, as

P (purchase | treatment) – P (purchase | no treatment). (3)

This would appear to be the ideal basis for targeting in many circumstances.

This paper reviews all of the known published approaches, and presents results from a range of real world problems. We also discuss the assessment of uplift models, and introduce a family of quality measures the ini measures, Q, q0 and Qc based on generali ations of the familiar Gini coefficient.

Real world applications discussed include retention modeling from telecommunications operators modeling

savability , cross selling in banking modeling incremental account opening , and deep selling in retail modeling incremental revenue .

oti tio

A fundamental problem in motivating uplift modeling is that the traditional term used for the conventional approach is

response modeling, which sounds exactly the same as uplift modeling. The very name response is strongly suggestive of the idea that a particular marketing action caused the

Using Control Groups to Target on Predicted Lift: Building and Assessing Uplift Models

Nicholas J. Radcliffe, Ph.D

Portrait Software and the University of Edinburgh

Page 17: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

15

response. However, the typical practice in response modeling consists of

. Choosing a target population

. Possibly holding back a randomi ed control group to allow post campaign assessment of incrementality , or uplift

3. Treating the target group minus any control group

. Recording those people who take some desired action during an outcome period as responders possibly with reference to a campaign code

. Building a response model on those customers sub ected to the action to understand the variation in outcome

response and

. Possibly assessing the incrementality or uplift by comparing the overall level of the desired outcome

response in the treated and control groups.

The resulting model estimates a conditional probability, such as P purchase treatment equation , rather than change in probability resulting from an action, or true response,

P (purchase | treatment) – P (purchase | no treatment). (3 bis)

This difference clearly lies behind references to true response and true lift in the titles of two of the publications. ,

Of course, response models are not necessarily used directly and naively one particularly common approach is to weight a modeled response probability by some kind of value such as purchase si e or a customer value . While sensible, this does not alter the underlying weakness of the traditional approach. Fundamentally, if we wish every unit of marketing spend to achieve the largest possible change in customer behavior however measured , then we need to model exactly that the

change in behavior that results from our actions a traditional response model equation does not do this.

e iew of ro che

There are two broad classes of approaches to building uplift models in the literature broadly, those based on trees, and those based on additive regression models.

3.1 Tree-based methods

The first two papers published on uplift modeling , both came out in favor of tree based approaches. There are three main classes of tree based methods in common use for simple

prediction Classification and Regression Trees CART , uinlan s C . and predecessors , and the AID CHAID 0

family. The first two are based on greedy, divisive methods that start with the whole population, consider a family of binary splits, assess the utility of each split using a quality measure an impurity measure, such as variance, Gini or information

gain , and then recurse. Various cross validation methods are then commonly used to right si e the tree through pruning. AID and CHAID are slightly different, and result in non binary trees, but again are fundamentally controlled by a single measure of split quality, in this case χ . Both Radcliffe & Surry and Maxwell Chickering & Heckerman proposed changes to the split criterion for the tree to focus on the difference in outcome between the treated and control populations. In the former case, the population si e for the two resulting segments was also taken into account, while in the latter case this difference in outcome was used directly as the split criterion.

3.2 Regression-Based Methods

The remaining three papers focused primarily on regressionbased approaches to modeling uplift.

The methods proposed by Hansotia & Rukstales3 and Manahan and also discussed by all the other authors , , are equivalent, and consist of simply building two independent regression models one for the treated population, and one for the control population, and subtracting these. This obviously has a number of attractions, including the vast body of literature on, and experience with, regression and its proven pedigree. Several of the authors, however, express a concern that, because the two models are independent, there is no direct attempt to fit the difference in behavior between the two populations while it is clear that if the models were perfect, their difference would accurately predict uplift, it is much less clear what properties we should expect of the difference when the models are imperfect.

Lo suggests a subtle variation on this regression theme. Rather than simply fitting two independent regression models, he proposes building a single regression with an extended set of independent variables. If the original n independent variables are denoted x xi for i , , , n , Lo proposes regressing over x, tx, t where for each customer

Lo s method is really a meta method that can be applied to any modeling approach, but regression is the principal case he considers, together with shorter discussions of layered feed forward neural networks multi layer perceptions and naive Bayes models.

Page 18: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

16

This produces a model that separates out regression coefficients for the main effect the coefficients of the xi , the basic treatment effect the coefficient for t and for uplift effects the coefficients for the interaction variables txi . The result is a model with the functional form f(x, tx, t) which is scored by computing f(x, 1x, 1) – f(x, 0x, 0) for each customer.

Lo argues against the more straightforward approach of simply subtracting two independent regression models, as proposed by other authors,3, on the basis that estimated lift can be sensitive to statistically insignificant differences in parameters of the treatment and control models.

e ri Perfor ce

Before looking at the results of building uplift models, it is important to consider the question of quality measures for such models, a sub ect which does not appear to have been discussed in the literature.

In any specific targeting situation, there may be a clear goal, such as maximi ing the total profitability of a campaign, perhaps on the basis of a net present value model, or maximi ing some non monetary outcome such as the reduction in attrition achieved. However, ust as quality measures such as Gini, R2, the Kolmogorov Smirnov statistic, and occasionally even classification errors, are useful for understanding the overall power of conventional models, it is desirable to have access to overall statistics that summari e the potency of an uplift model.

Many of the performance measures for traditional models depend fundamentally on a comparison of actual and predicted outcomes at the level of an individual. These are intrinsically unsuitable for generali ation to the case of uplift because we can never have a definitive measure of uplift for an individual, since no one can simultaneously be treated and not treated.

plift can therefore only be estimated on a per segment basis, and the estimated uplift for an individual is generally different when estimated with respect to different segmentations. Partly for these reasons, we take the Gini coefficient as our starting point, and propose a family of measures under the umbrella name of ini coefficients.

4.1 The Gini Coefficient for Conventional Models

There are many ways of defining Gini, but the most convenient starting point for this work is to define it with reference to the familiar gains chart, which is among the most common ways of assessing traditional models. Consider a demand generation application in which the desired outcome is a fixed

purchase. The gains chart is constructed by first sorting the population, from best to worst, by the score in question. The graph then shows the number of responses achieved vertical axis as a function of the number of people treated hori ontal axis . A perfect model assigns higher scores to all

of the purchasers than it does to any of the non purchasers. Thus the perfect model first climbs at , re ecting the fact that all purchases are assumed to be caused by treatment. After the last purchaser has been accounted for, the graph proceeds hori ontally, as shown below. In contrast, random targeting results in a diagonal line from 0, 0 to N, n , where N is the population si e and n is the number of purchases achieved if everyone is targeted. Real models usually fall somewhere between these two, forming a broadly convex curve above the diagonal as shown, while a reversed model that tends

to assign better scores to non purchasers than purchases will fall below the diagonal (Figure 1, left).

The Gini coefficient is simply the ratio of the area above the diagonal of the actual curve, to the corresponding area above the diagonal of the optimum curve. A perfect model therefore has a Gini of or 00% a model of little or no predictive power will have a Gini close to ero and an inverted model will have a negative Gini, which can be as low as or 00% . The Gini coefficient is more commonly defined with reference to the receiver operating

Page 19: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

17

characteristic ROC curve, also known as the Gini curve, which plots the number of non responders, rather than the number of people treated, on the hori ontal axis. This can be shown to be equivalent, but is a less natural starting point for the case of uplift models.

4.2 The Qini Coefficients Q and q0

for Uplift Models

We now introduce the ini coefficients, which are natural generali ations of the Gini coefficient to the case of uplift. Again, we start by drawing a graph, which we call either the Gains Chart for plift or, more simply, a ini Curve. This is the same as a Gains Chart, except that the vertical axis now shows the cumulative number of incremental sales achieved, or the uplift. This is estimated on a per segment basis by comparing the purchase rate in the treated group and the corresponding control group. The estimated number of incremental sales in a segment is given by

where Rt and Rc are the number of purchases in the treated and control groups respectively within the segment in question , and Nt and Nc are the corresponding total si es of the treated and control groups within the segment.

Were it not for the possibility of negative effects, we would then proceed as before. In reality, the potential for negative effects introduces significant complications. Suppose, for simplicity, that in addition to a treated group of 00,000 with an overall purchase rate of 30% there is a control group of 00,000 with a purchase rate of 0%. The overall uplift is clearly twenty percentage points. It is certainly possible that the effect of the treatment was simply to persuade 0,000 people to buy, who otherwise would not have done so. But is it also possible that up to 0,000 people who would have bought without the campaign as estimated by the control group were caused not to purchase by the campaign. These, of course, would have to be balanced by 0,000 more people who were persuaded to purchase who would not have done so otherwise. In this case, we would end up with a ini curve as shown in Figure 2 (above).

The extent to which it is theoretically possible to exceed the actual uplift observed is usually limited by the purchase rate in the control group. When there is a relatively low purchase rate in the control group, as here, the limit is that all excess positive effects must be balanced by negative effects, and the worst a treatment can do is to prevent the purchases that would otherwise have occurred, as quantified by the purchase rate in the control group. Therefore, the uplift can never exceed the overall uplift by more than the purchase rate in the control group.

In rarer cases, the limiting factor is simply that the purchase rate can never exceed 00%. Thus, for example, if the treated and control purchase rates are % and % respectively, clearly the highest possible uplift is 00

percentage points.

Given these observations, we can now define the ini value Q, for binary outcomes, in the same way as the Gini coefficient, i.e., as the ratio of the actual uplift gains curves above the diagonal to that of the optimum ini Curve, shown as the solid red line in Figure . As with the Gini coefficient, this theoretically lies in the range , , though, because of the uplift, can only ever be approximated actual calculations may occasionally lie slightly outside this range. Note also that,

There are some calculational subtleties with constructing the best ini curve, which result largely from the fact that uplift estimates are not strictly additive. One consequence is that it is usually more accurate to estimate the cumulative uplift at each point from ero to that point directly, rather than accumulating a set of uplifts.

Page 20: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

18

whereas in the conventional setting it is clearly possible to order the customers in such a way as to achieve the optimal Gini simply by sorting all the responders ahead of the nonresponders , it is rarely clear whether such an optimal ordering actually exists for the case of uplift.

When the overall uplift is non ero, it is also sometimes convenient to define the little q0 value as the ratio of the area of the ini curve above the diagonal to the area above the diagonal of the ero downlift optimum ini, which is the maximum ini curve that can be achieved without invoking negative effects. This measure in some ways behaves more like a conventional Gini coefficient, except that it is possible and not uncommon for it to exceed 00%.

The case of ero or near ero overall uplift is also interesting because it emphasi es that, where treatment has negative effects on some portions of the population, even a campaign with no overall lift may contain segments in which the treatment is effective. This leads to optimum ini curves such as that shown in Figure 3. As the overall uplift tends to ero, q0 values tend to ∞, so big Q values are much more useful in these cases (See Figure 3, below).

4.3 Continuous Outcomes

For the non binary case, we again start by reviewing the construction of a Gains Chart and the computation of Gini from this, and then generali e. Again, without loss of generality, we will focus on the case of a purchase, but now we are interested in the total value of incremental purchases, rather than their number. The gains chart for a continuous outcome is the same as that for a binary outcome except that the vertical axis shows cumulative value rather than the

cumulative count of purchases as function of the proportion of the population targeted. Since it is possible that the entire purchase value comes from a single customer, rather than rising at degrees, when scaled naturally, the optimal gains curve for a continuous outcome rises essentially vertically. Again, the Gini is simply calculated as the ratio of the areas above the diagonal of an actual model and the optimum model, and again lies in the range , .

The problem we are then faced with in generali ing this case to handle incremental purchases is that there is no bounding

optimal ini curve for the continuous case. This is because the availability of negative effects means that we could, in principle, have arbitrarily large positive and negative effects that cancel.

This lack of any well defined optimum ini curve does not affect q0, which can be defined as before, as the ratio of areas above the diagonal of the actual uplift gains curve for a model and the ero downlift ini curve, which is now simply upper triangular. nfortunately, however, as before, this is not satisfactory as the only ini coefficient, because it is not well defined if the overall uplift is ero, and is extremely unstable if the overall uplift is small.

We therefore need to find a value similar to Q for the continuous case. We do this simply by dividing the area above the diagonal of our uplift gains curve ini curve by half the square of the total number of customers, and we call the resulting value Qc. Rather than a dimensionless quantity, this value is in the units of whatever the outcome is measured in, commonly money, which can be thought of as per head. So if we have an overall uplift in revenue of ,000,000 when we target ,000,000 people, the ero downlift ini has a Qc value

of .00, or .00 head. There is clearly a degree of arbitrariness in the scaling of this quantity, but the advantage in dividing by half the square of the population si e is that the measure becomes independent of population si e and is scaled in a way that is consistent with Q and q0. (See Figure 4, pg. 19).

e t

We present three examples of the application of uplift modeling to real world, commercial problems. All of these problems were tackled using the various implementations of Radcliffe & Surry s approach as it

Page 21: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

19

developed , using commercial software developed by uadstone and now marketed by Portrait Software.

5.1 Example 1: Deep-Selling

A retailer used a catalogue mailing to drive greater spend activity among active customers an example of deep selling.‡ The customers were selected on the basis of a conventional response model the so called champion model built on data from customers mailed in a similar previous campaign. Approximately 00,000 of those ranked as likely high responders by the model were targeted, and approximately

0,000 were held back as a control group. On average, spend increased by per head among those mailed. An uplift model was

then built, using data from the same campaign as that used to construct the champion model. In contrast to the champion model, however, the uplift model used information about the control group as well as the mailed group.

Graph 1 (left) shows the ini curve for uplift in spend. The blue line dashed shows the result of targeting with the

champion model and the red line solid shows the effect of targeting with the uplift model. p to about 0%, the two models perform similarly. Thereafter, the models diverge. For example, if 0% are targeted, the uplift model manages to identify customers delivering about % more revenue than the champion model approximately . against 0. per head. And if 0% are targeted, the uplift model manages to retain % of the incremental spend . 0 per head against only % . 0 per head for the champion model. Clearly, the uplift model is significantly better at identifying customers for whom marketing spend generates a positive return.

5.2 Example 2: Retention

A mobile phone company was experiencing an annual churn attrition rate of approximately % in a segment. It targeted

While cross-selling tries to sell new products to customers, and up-selling aims to drive customers to upgrade, deep-selling simply tries to increase the frequency or si e of their transactions.

Because the results shown here are from real companies, some results have been systematically rescaled to protect client anonymity and confidentiality. In all cases where this has been done, the scaling has been chosen to reduce the overall impact of claims made, never to exaggerate them.

Both results are shown for validation holdout data.

Page 22: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

20

the entire segment with a retention offer, holding back only a control group. The net result was an increase in churn to 0% among the treated group, while the churn rate in the untreated group remained at %. Obviously, this is the exact opposite of the desired effect, but we have witnessed this phenomenon repeatedly. It seems that retention interventions often backfire because they variously remind customers of their ability to terminate, provide a catalyst to help overcome inertia, and annoy customers through intrusiveness.

Clearly, one way to improve the situation is to stop this retention offer entirely. However, there was a strong belief within the business that the offer was valid and did work for some groups of customers. Also, no more successful approach had been identified. An uplift model was therefore built to try to identify a sub segment within which the treatment was effective.

Graph 2 (page 19) is a ini curve, but the goal is now to achieve the greatest possible reduction, i.e., a negative increase in churn. As usual, the hori ontal axis shows the proportion of the population targeted, while the vertical axis shows the resulting increase in total churn. The mailing was untargeted, so the diagonal line blue shows the effect of random targeting of customers. Of course, with such random targeting, extra customers are lost in proportion to the number treated. However, the red line concave shows the effect of targeting on the basis of the uplift model. The results are striking.

The model shows that retention activity was effective for

about 30% of the customers if only the 30% identified as most savable by the model are treated, churn across the

entire segment falls by . percentage points, from % to . %, i.e., over 3% fewer customers churn. Compared to

targeting everyone, churn actually reduces from 0% to . % a proportionate fall of %. The segment contained

approximately million customers, so using industry standard ARP of 00 year, the financial impact of moving to uplift modeling, for one segment alone, is an improvement of around

. million.

5.3 Example 3: Cross-selling High-value Products

We have also tackled a number of cross sell targeting problems for banks. A typical scenario involves cross selling activity aimed at increasing product holding. The value of many banking products is high, so that even an increase in product take up as low as a tenth of a percentage point can provide a positive return on investment for mailings. However, we have shown that, with appropriate targeting we can usually achieve between 0% and 0% of the same incremental sales while reducing mailing volumes by factors ranging from 30% to 0%. Because the banks themselves have attempted to model uplift, they typically have historical data allowing full longitudinal validation of results.

One of the complicating factors in these scenarios is that the uplift from the campaign is usually significantly smaller than the natural purchase rate of the product being promoted typically by a factor of five to twenty. Thus, it would not be unusual to see a purchase rate in the control group of around %, and in the treated group of . %. However, drivers of the base purchase rate are often quite different from those of the incremental purchases resulting from the campaign. Because of this, non uplift approaches to targeting are often doomed to failure, and sometimes actually perform less well than random targeting.

Graph 4 (top) shows an example of one such campaign. Here, the net effect of the campaign was to increase the uptake of the product by a quarter of a percentage point. However, the uplift model shows that over 0% of the increase in sales comes from ust 0% of the targeted population 0% comes from 0% of

the population and % comes from 0% of the population. Notice also that the champion model produced substantially worse results than random targeting in fact, in this case, reversing the ranking from it would have been very much more effective than using its actual output. This suggests that this campaign is being effective in stimulating demand from the very people who tend not to purchase without intervention.

Page 23: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

21

er tio fro Pr ctic e

In practice, most of the real difficulties with uplift modeling derive from noise. Noise arises for two principal reasons. First, when building uplift models we are attempting to fit the difference between the behavior of two populations. At the simplest level, when we do this, errors add. Secondly, while from a modeling perspective, we would ideally choose to have equal and large numbers in the treated and control populations, in practice, one is almost always much smaller than the other. This is because most targeting happens either in a trial setting or a roll out setting. When a treatment is first being evaluated, the treatment volume is typically low to contain risk. Conversely, in roll out situations, once a treatment is proven, the goal is usually to maximi e its impact, and therefore the si e of the control group tends to be limited.

nfortunately, it is the smaller of the two populations that usually most strongly limits the performance of uplift models. The situation is also not helped by the fact that the uplift phenomenon being modeled is often small compared with the absolute outcome rates for example, as quoted in section

.3, we often see uplifts of a tenth of a percentage point on campaigns with an apparent response rate more like %.

We have therefore found it necessary to employ a wide variety of methods to control noise, including careful variable selection and binning methodologies, bagging, stratified sampling, and k way cross validation methods.

o c io rther e e rch

Five researchers have independently proposed methods for modeling uplift to allow more appropriate targeting of marketing actions. In this paper, we have demonstrated three real world examples in which such an approach has proved capable of delivering dramatic improvements in the profitability of marketing campaigns. We have also introduced a family of statistical measures appropriate to evaluating the performance of uplift models in ranking populations by uplift. These are the ini measures Q, q0, and Qc. We have found these to be extremely useful in comparing and assessing uplift models.

Open research issues, as discussed by Lo, include more complex treatment scenarios where there are multiple treatments of treatment variables and handling the challenges presented when the selection of the control group was not

uniformly random. Detailed comparative benchmarking of competing methods, while sub ect to all the usual difficulties of achieving fairness, would clearly be valuable. However, our experiences over several years strongly indicate that performance of uplift models on fabricated test data is often a particularly unreliable indicator of likely performance on realworld data. A significant challenge is therefore to find suitable data that can be made publicly available for benchmarking. ■

efere ce

. N. J. Radcliffe & P. D. Surry. Differential Response Analysis Modeling True Response By Isolating the Effect of a Single Action. Proceedings of Credit Scoring and Credit Control VI. Credit Research Centre, niversity of Edinburgh Management School

. D. Maxwell Chickering & D. Heckerman. A Decision theoretic Approach to Targeted Advertising. Sixteenth Annual Conference on Uncertainty in Artificial Intelligence, Stanford, CA 000

3 B. Hansotia & B. Rukstales. Incremental Value Modeling. DMA Research Council Journal, . 00

. V. S. Y. Lo. The True Lift Model. ACM SIGKDD Explorations Newsletter. Vol. No. , . 00

. C. Manahan. A Proportional Ha ards Approach to Campaign List Selection. SAS User Group International (SUGI) 30 Proceedings. 00

. L. Breiman, J. H. Freidman, R. A. Olshen & C. J. Stone. Classification and Regression Trees. Wadsworth.

. J. R. uinlan. Induction of Decision Trees. Machine Learning. Vol. , No , 0

. J. R. uinlan. C . Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA. 3 .

. Kass, G. An Exploratory Technique for Investigating Large uantities of Categorical Data. Applied Statistics, Vol. , 0

0. J. Magidson. The se of the New Ordinal Algorithm in CHAID to Target Profitable Segments. The Journal of Database Marketing, Vol. , .

3

. V. S. Y. Lo. Marketing Data Mining New Opportunities. Encyclopedia of Data Warehousing and Mining ed. J. Wang . Idea Reference Group.

00 .

. L. Breiman. Bagging Predictors. Machine Learning. Vol. , No. , 30

Page 24: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

22

tr ct

This paper illustrates the application of the Order Structure Multinomial Logistic Model MLM to incorporate additional prospect or customer attributes, whether collected via a marketing research instrument such as an attitudinal based survey or supplemented by purchased data from a speciali ed list provider, for a subset of prospect establishments that were already rank ordered based on the outcome of preexisting propensity to purchase for a specific product family acquisition models. The two stage modeling process will enable statisticians to enhance the performance of these propensity to purchase models, and will also enable marketers to gain additional insights into their prospect databases. We have included sample MLM predictive modeling output to demonstrate various business to business B to B enhanced modeling scenarios.

tro ctio

In marketing, a common question to ask is How will the introduction of new customer prospect attributes i.e., new information collected about customer preferences attitudesetc. affect the overall acquisition propensity for a product or product family For example, an acquisition model or a suite of acquisition upsell cross sell models may already be in place for assigning propensity scores to each B to B prospect and customer. However, additional data on a subset of these prospects and or customers may now be readily available to augment these demand based attributes. Integrating this new information into the pre existing suite of models could be tricky, especially if that sample of prospects customers is a biased subset of records relative perhaps NOT in a quantifiable way to the prospect customer universe from which it is drawn.

Developing “Linkage Models” for Enhancing B-to-B Acquisition Strategies

Gabrielle Bedewi and Paul Raca

Page 25: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

23

T ic o tio

The typical approach to append the additional information to the prebuilt acquisition propensity model would be to rebuild the acquisition model using only that subset of records where the establishment characteristics and the additional external information are available. Consequently, the prospectscustomers that could be scored with this updated model would be limited to that subset of records, and not the full prospect customer database, because that additional data is not available about, nor collected for the full prospect customer base.

h ce o tio

Our challenge, from one of our clients, was to update a suite of B to B propensity to purchase models upon receiving new data collected for a sample of prospect establishments that were identified as most likely to be interested in a specific product family, where that level of interest was not necessarily based on the propensity scores of pre defined statistical models.

Figure 1 (see page 22) illustrates the process that we implemented to generate the enhanced suite of propensity to purchase models.

A survey was conducted to collect additional information related to our client s B to B market characteristics for both customers and prospects. This additional data was collected to rank the survey responders purchase and referral interests across two products. Consequently, the survey responders constituted only a small sample , establishments of the . million establishment prospect base, with the further complication of implementing two separate referral rankings using two different sets of prospect attributes, as demonstrated in Figure 2 (above). These rankings are presented as a crosstabulation of the small set of responders whether customers or prospects , where the cells are defined by five ordinal categories for each set of attributes. These ordinal categories essentially represent Likert scales of interest in the specific products.

Consequently, the challenge is how to incorporate these newly collected characteristics into the pre existing propensity to purchase models, built using overlay establishment characteristics, despite the fact that the survey responses were only collected for a potentially biased subset of that client s prospect database.

t e r ere tr ct re o e :

By first exploring and identifying the explanatory relationship between the sampled , establishments characteristics as they appear on the prospect database, with their corresponding survey based questionnaire responses of interest, we are able to deliver attitudinal and interest based scores for the entire prospect base. The second stage model build will be where the pre existing product family propensity models are rerun while incorporating these new attitudinal and interest based scores as potential predictors that may affect these product families purchase propensity.

For this application, and in order to establish a single target variable for the Stage ordered structure MLM model, we

Likert Scale is a widely used questionnaire format developed by Rensis Likert, in which respondents are given statements and asked to respond by stating whether they strongly agree, agree, disagree, strongly disagree. Further discussions are included in reference by Grossnickle and Raskin

00 .

Further discussions are included in reference by Agresti .

Page 26: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

24

simply added the values of the two High End Product and Low End Product recommendation scales together thus

treating each scale as having an equal weight of contribution to the overall product family interest score and subtracted

from the resulting sum to maintain a scale ranging from to . Other weight schemes could potentially be explored

and implemented to redistribute the importance of one set of attributes over another.

The above approach resulted in an ordinal target dependent variable with values ranging from to , where a value of

represents prospects most likely to recommend these products, while the value of represents prospects least likely to make such a recommendation. Furthermore, the sample of establishments that were selected for completing the product interest survey questionnaire was identified to represent prospects with a high level of interest. Consequently, we needed to supplement that sampled subset of establishments with an additional complementary random sample from the prospect database and assign these a target value of 0 for a recommendation score being conservative in the assumption that they would have been a low interest recommendation group and thus not selected for completing the interest survey questionnaire . The additional complementary sample of establishments from the prospect database helps in balancing out the selection bias in the original sample of surveyed establishments.

Therefore, the Stage model dataset consists of the , survey responders and an additional random sample of , 3 establishments drawn from the overall prospect database. The dependent variable is a polychotomous ordered response variable, resulting from the above mentioned combined scales, thus the need for developing an order structured multinomial logistic regression. The model is built in such a way that the target dependent variable value of 0 represents the base and all the odds ratios associated with the other response levels are computed relative to that base. In fact, with this type of MLM model, nine different odds ratios are being estimated at the same time.

This model can be built using PROC LOGISTIC3 in SAS. β would represent the change in odds of being considered as having a high level of interest but of scoring low for recommending either one of the two products. β represents the change in odds expected from an increase in the dependent variable from level to level combined scales. β3 represents the change in odds from level to level 3, and so on. Each

of the coefficients for the independent variables would vary for each level of the target variable. For instance, if there are independent variables that are ultimately determined significant for the final Stage MLM model, a total of 3 coefficients would be created one set of coefficients for each of the nine modeled levels not including the base case of the target variable. That is not to say that all of these coefficients will come out to be statistically significant. It is possible that an independent variable will only significantly impact a limited set of response levels.

To demonstrate, Table lists the results of the Stage Ordered Structure MLM. The independent variables can be grouped into the following categories

. Industry ags to indicate whether a prospect establishment belongs to a certain SIC code grouping,

. Market Index ags that may correspond to indicators of demand at a sub industry level,

3. Company Si ing variables such as revenue, number of employees, and number of locations.

. Company Credit worthiness, and

. Indicator for whether any past relationship existed with our client.

As illustrated by the model output in Table , the Industry ag is a statistically significant predictor, stating that a

prospect in this industry has an estimated odds ratio for having a response level of that is .3 times higher in comparison to a response level 0. Furthermore, the odds of having a response level of is estimated to be . times higher than of having response level . Therefore, as this is a cumulative logit model resulting in cumulative odds ratios for each interest level ,

and the coefficients associated with Industry are all positive, the overall estimate of the probability of observing response level becomes quite large for companies in this industry.

In applying this model for scoring the overall prospect database over . million establishments , the regression s fitted values that represent the expected probabilities at each response level are evaluated. Depending upon the ratio of target establishments with response level to nontargets with response level 0 going into the model, these probabilities can be quite small. For instance, the expected unweighted probabilities associated with response level are

presented in Table 2 (page 26).

The overall probability of observing a response level primarily driven by what was observed in the survey sample, 3 Refer to page of Stokes et. al. for further details.

Page 27: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

25

but also impacted by the si e of the non surveyed sample of prospect database establishments that were added to the model for response level 0 is less than 0.0 . Even those probabilities associated with the prospects that are most likely to purchase average about 0.033, as shown at the bottom of Table . Consequently, the odds ratios that result from this model can get very high and still yield very small probabilities.

ltimately, the final score for each establishment in the prospect database of over . million records can be calculated based on the weight of each of the expected probabilities by its response level. This calculation will result in an expected response level for each prospect. For the most part, these values will tend to be low.

Again, it is important to remember that the ma ority of the prospect database establishments were not selected for the survey sample in the first place, because they were deemed

not as likely as the sampled group to have an interest in the product thus they automatically have MLM scores that are somewhat skewed to low scores since they generally share

characteristics with establishments with target value 0. Consequently, our scored population which is very large relative to the model dataset averages an expected response score of 0.0 .

However, even though the scores are small, the resulting gradation in scores creates additional variability that becomes useful in the Stage modeling. In fact, not all of the survey scores are small, which further illustrates the purpose of the Stage model. In all, there are more than 3,000 non sampled establishments from the overall population that receive a survey score of or more.

Clearly, these establishments share some characteristics with the establishments in the survey population such that their scores are much higher than average.

When the importance of the attitudinal survey variable is reali ed in Stage and the scoring applied, these establishments will receive higher acquisition scores because of their similarity to the survey respondents.

Page 28: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

26

t e o i tic e re io h ce Pro e it to P rch e o e :

Moving on to the second stage of the linkage modeling process, the resulting Stage scores are incorporated into the suite of pre existing propensity to purchase models. Consequently, for each of the Stage logistic regression models, the original propensity to purchase model dataset is scored by applying the Stage Ordered Structure MLM model. In the Stage product family logistic regression models, the expected response probabilities at each response level are estimated using the same formula as would be used for a dichotomous response logistic regression. That is

With the Stage model scores integrated into the Stage modeling datasets, the product family level propensity

models are re run, including the original data elements establishment characteristics in addition to the new Stage

model s score.

When the Product Family propensity to purchase models were rebuilt, about half of them yielded exactly the same model. That is, the new score variable was not a significant predictor. As for the other half, the propensity models changed the data collected in the survey did constitute new useful information affecting the propensity to purchase in one of two ways. The new survey informationbased score was an additional predictor to the pre existing list of predictors, causing some minor coefficient value changes, or it was a replacement for an original significant predictor. These scenario variations are included in Table 3 (page 27).

eco e e i tio tr te

pon completion of the two Stage modeling and scoring of the prospect database records, it is recommended that the results be validated with actual acquisition marketing results. At the time that this article was written, the data regarding newly acquired customers was not yet available for validating these models. However, within the next six months we anticipate that there will be a sufficient number of establishments that will have purchased the product families of interest, so that we would be able to tabulate which deciles these establishments appeared within when the enhanced model scores were applied. Our expectation is that there will be a greater number of purchasing establishments among the top deciles resulting from the enhanced Propensity to Purchase models as compared to the original acquisition models in place before the improvements resulting from the addition of survey based interest referral scores were incorporated . The statistical significance of these improvements will be tested using a χ hypothesis test, to conclude whether there is a significant difference between the purchasing establishment counts by decile from the enhanced models versus the original models. ■

Page 29: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

27

efere ce

Agresti, Alan, Proportional Odds Model, An Introduction to Categorical Data Analysis, ,

Grossnickle, Joshua and Raskin, Oliver. Rating Scales, The Handbook of Online Marketing Research: Knowing Your Customer Using the Net, ,

00

Stokes, Maura E., Davis, Charles S., and Koch, Gary G., Logistic Regression II Polytomous Response, Categorical Data Analysis Using the SAS System,

33

Vogt, W. P. Dictionary of Statistics and Methodology, A Nontechnical Guide for the Social Sciences, , 3

Page 30: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

28

Cross-Selling Optimization: A Model-Based Framework and its Applications

Hongjie Wang, David King, and Sean Christy

Fulcrum Analytics Inc.

tr ct

Effective cross selling is an essential part of successful customer relationship management in database marketing. We present an innovative framework that will allow marketing professionals to optimi e cross selling operations by providing two important inputs for the marketing decision process forecast of the product category preferences, and timing for a customer s next purchase. The model can be implemented with readily available statistical software, incorporates temporal choice better than existing models, and is exible enough to accommodate varied direct marketing applications and fields. The central framework utili es a multi spell discrete survival modeling technique and can incorporate supply side economics, such as product margin and product retention effects, into one optimi ation system. Our application of this approach across multiple pro ects in several different industries has provided empirical validation on both the strength and practicality of our methodology.

tro ctio

From a strategic standpoint, much of the research and discussion of customer relationship management CRM has been on improving customer retention. From a more tactical and operational standpoint, marketers are increasingly interested in finding ways to improve their cross selling marketing, since one of the reasons marketers want to maintain long term relationships with their customers is to be able to sell them more value adding products. Conversely, it is well established that the depth and breadth of product category purchases are directly related to customer loyalty and long term profitability.

In today s competitive marketplace, companies face the challenge of greater competition for share of customer wallets, as well as customer mindshare. How to fully leverage customer knowledge and utili e every customer touch point to take advantage of limited interactions by offering the most relevant value adding products is the key to a sustainable and profitable long term relationship.

In this paper, we present an analytical framework to optimi e a company s cross selling operations. The core of this framework involves an innovative statistical model that takes into consideration both product choice and timing. The predictions of the model are then combined with supplyside economics to drive optimi ed cross selling operations. Our approach can be calibrated by any direct marketing professional who is familiar with logistic regression, using a standard statistical software package such as SAS or R.

The remainder of this paper is organi ed as follows Section provides a review of different cross selling modeling and

related approaches Section 3 describes the optimi ation framework Section summari es the methodology we have used to develop the prediction model Section presents a case study using banking data and finally, Section concludes the paper with a summary and suggestions for future research.

ro e i o e i e e rch

Consumer cross category choices have been a ma or topic for many years in traditional marketing research. ,3 Most researchers in this area have focused on product category dependency and oint price elasticity sensitivities. In our view, models introduced in the literature on this topic often require

Page 31: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

29

speciali ed software for implementation, do not model timing and choice simultaneously, and have limited applications to direct marketing. The paper by Vilcassim & Jain addresses the issue of consumer choice over time, and models the brandchoice process as a continuous semi Markov Chain. They show that there is a clear time dependency on brand transitions. Our approach is structurally very close to that of Vilcassim & Jain. However, instead of brand choice, we focus on product choice, and instead of using a proportional ha ard approach to implement the semi Markov Chain, we use a more exible discrete multiple event competing risk model.

The CRM literature on choice modeling in general and on cross selling in particular is growing rapidly with many diversified approaches see Kamakura et. al. for a review. One approach applied to financial services is to use latent classes to analy e a consumer s product acquisition pattern. , This approach can generate insights for product planning decisions however, its application in direct marketing is limited. Another model applied in a banking setting analy es cross selling patterns within a random utility framework. This modeling approach is highly speciali ed, and is likely to be beyond the reach of most marketers.

Prin ie & Poel use a high order Markov Chain to model a customer s product sequences. However, their approach does not address product adoption timing. Knott, Hayes, and Neslin 0 provide one of the first papers modeling next productto buy in a direct marketing setting. The timing part of the model, however, is treated separately, rather than within one integrated model. The paper by Kuo and Chen offers an

approach to modeling inter purchase time and brand choice using panel data. They implement the entire model using SAS. However, their approach treats timing and choice separately, and is not geared towards predictions.

Indeed, most of the published research treats timing, and product or brand choices as two separate processes. A noticeable exception is one integrated model used to tackle customer selection, product offer, and contact timing simultaneously. The drawback of this approach is that the model has a complex likelihood function and would require special software to estimate. The paper by Harrison and Ansell 3 is one of the few we have found that suggests using a competing risk survival model to predict the next product, although no detail was given in terms of the model formulation or techniques.

Despite very active research and debate on the cross selling strategies, the extant research on the subsequent operational optimi ation is very limited. We hope our paper can help to fill the gap in this area.

ro e i ti i tio : tic r ewor

Our proposed cross selling optimi ation framework contains three key components, as depicted in Figure 1 (below.)

As with any advertising medium, direct marketing aims to persuade the customer to take a particular action. In order to prompt such action, direct marketing communications must

i re : tic r ewor for ro e i ti i tio

Page 32: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

30

incorporate three dimensions of communication relevancy whom to target, what to offer, and when to communicate. It is the synergistic interactions of these three effects that in uence a customer s final decision. Therefore, it is important that the proposed statistical approach integrate all these elements into a unified model.

In addition to customer preferences, it is equally important to consider supply side economics in cross selling operations. Specifically, we take into consideration the products profit margins, relative effects of each product on customer retention, and other business criteria. It is well established that, in general, the more products a customer has, the more loyal the customer becomes. To link retention and product adoption in a cross selling framework quantitatively, we adopt a procedure similar to the one used by Iwata et. al. We first use a simple Cox regression model to predict customers probability of remaining active for next year. By including product adoption decisions as predictors in the retention model, we can then derive the incremental retention effect of a product for any customer. Specifically, we derive ∆Rj as the increased probability of remaining active another year due to product j. For ease of discussion, we assume such incremental probability improvements are customer independent. However, the proposed optimi ation framework is equally valid and applicable if we have customer or segment specific incremental retention effects. Furthermore, we have focused on a one year incremental effect, but the framework can be generali ed to a more long term setting.

Like any optimi ation problem, it is critical to set up appropriate ob ective functions. It is important that a good balance be struck between ob ectives that are exible enough to re ect practical business realities, and those that are focused enough so that they are amenable to standard optimi ation techniques such as linear or integer programming. As we have presented in Figure 1, there are be different optimi ation criteria one could use, depending on the business priorities. Here we present one ob ective function maximi ing the incremental profit in the next one year. In our experiences, companies existing product margins and retention metrics are annually based, so focusing on one year is appropriate.

Mathematically, let Pij denote the probability of customer i adopting product, j, Mj denote the annuali ed profit margin of product j, ∆Rj denote the incremental retention effect of product j, and Vi denote the expected annual profit margin from customer i s other relationships if she remains active.

The annuali ed incremental value of offering product j for customer i is

INCi Pij (Mj + ∆RjVi )

The first term inside the brackets is the value of adopting product j from the additional product profit margin, and the second term corresponds to the incremental value of retaining customer i by offering product j. Both the incremental retention effect ∆Rj and product margin Mj are assumed to be invariant across customers. However, for companies maintaining such metrics at individual customer or customer segment level, the annuali ed incremental profit can be derived similarly without any structural changes. Finally, we point out that, in our current discussion, we do not differentiate between the propensities of product adoptions under campaign and non solicitation conditions. However, if the historical data contains experimental design based data campaign promotion vs. control cells , one can easily change

Pij in Equation to ∆Pij the incremental propensity due to promotion.

The optimi ation can be performed at a customer level where equation is used to determine the best product to offer to customer i. Segment or portfolio level optimi ation, such as the optimal campaign si e and product offer mix, can also be performed by formulating more complex optimi ation problems.

The incremental margin of a product and its retention effects may not be the only beneficial outcomes for that product, so within the optimi ation framework, one might need to incorporate other business considerations. For example, some products may have a so called halo effect, and while perhaps not profitable in their own right, generate additional sales of other, higher margin products. A common and convenient way to include additional business considerations, is to assign a set of sub ective weights for each of the products of interest.

Di crete ti e o eti i r i o e

The seminal paper by Helsen & Schmittlein started a long stream of research using survival and duration models for marketing. Most of the applications, however, are single event cases. For cross selling operations, there are some unique challenges that we need to consider.

Each customer can potentially have multiple purchases or product adoption events in history.

Page 33: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

31

We can expect purchase cycle dependency. It is well known, for example, that in many industries such as retail or banking, a customer s purchase rate accelerates as the relationship progresses. Therefore, it is desirable to have separate baseline ha ard functions for different cycles.

There are product sequence dependencies. Customers choices for their next products are partially associated with what products they have had so far.

There can be duration dependency. In other words, the ha ard function of inter purchase time is not constant, but instead follows other less restrictive distributions.

We should consider multiple sources of heterogeneity. There will be unobserved cross sectional heterogeneity across different underlying latent segments of customers. In addition, since the model is longitudinal and dynamic, and since there could be multiple purchase events for the same customer, we should expect unobserved individual specific factors.

To meet the above requirements, we propose a discrete multispell competing risk survival model. Mathematically, we have

Here hi,k→k l, t is the ha ard function of customer i at purchase cycle of k kth purchase to k th purchase for next product type j. The choice of time unit depends on the granularity of the data available. In most of our pro ects we have used month or week as the units. Since this is a discrete survival model, the ha ard function represents the conditional probability of adopting product j, at discrete time t, given the fact that such event has not happened yet. Since there are multiple choices of future products, it is a competing risk model. The sets of baseline ha ard functions are given by αkj t , which can be estimated using empirical data directly. In our professional practice, we have used a number of functional forms, including linear, piecewise linear, log linear and polynomial functions, to model baseline ha ards.

We assume all variables are time varying, since static variables can be considered as a special case of time varying variables. We have included three types of variables. X is a vector of variables whose effects are independent of the next product choice. An example would be customer tenure, which is believed to be positively associated with the propensity of

adopting additional products. If we have no reason or no data to support the hypothesis that its effects might be different across products, then we can model the effect of tenure to be product independent. Y is a set of variables whose effects are independent of purchase cycle. For example, holiday season is a period when customers tend to purchase more gift items. This is, in general, applicable for both new customers and established customers, even though some segments of customers are known to purchase more heavily during holiday seasons than others. In the most general case, we have included a set of variables W whose effects are different across products and different purchase cycles. For example, a customer s historical product ownership and the sequence in which products were acquired, are expected to be predictive of the next product, and we expect their effects to vary depending on the next product and how many products this customer has already. Finally µi is not the typical error term in a traditional regression model, rather, it is a customer level shared frailty element. Indeed, there is no error term in conventional survival model formulation since one directly models the time distribution, not a location parameter of the distribution. The most common specification used in the literature is to draw µi from a Gamma distribution because a ha ard function should be positive. Since we use a logit parameteri ation in our applications, we use a Normal distribution N 0, .

The proceeding model formulation has several appealing features. First, it provides a exible specification of the baseline ha ard functions, which is important, since standard parametric distributions often inadequately fit empirical ha ard rates. Second, it can include endogenous time varying variables such as marketing mix and seasonality, both of which are known to in uence a consumer s decision of next product purchase. Third, we introduce a random effect component at the customer level common to all episodes of events to ad ust for heterogeneity. Lastly, the model can be estimated using any standard statistical software capable of performing logistic regression. This is a very important distinction between our approach and some other approaches. In direct marketing, we often face time and budgetary constraints for model development. It is important that standard tools can be used without excessive extra investments.

i e t

We illustrate the model and cross selling optimi ation using a banking case study. The data we have used for this application comes from a business banking division of a large regional S

Page 34: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

32

bank. The sample consists of ,000 business customers who were acquired between 000 and 00 , and were active as of the end of 00 . We traced their history up to the end of 00 and used product adoption and transaction history in this fiveyear period to calibrate our model.

The products in this study included Business Demand Deposit Account DDA , Business Savings, CD, Credit Card, Loan, Personal DDA, Personal Savings, and a miscellaneous product category. From the historical data, we have derived the following sets of variables as potential inputs in the model

customer tenure at time t

sets of indicators on customer ownership of any of the above products at time t

accounts average balances at time t

original acquisition channel

cumulative number of direct mail communications the customer, received up to time t

a set of monthly indicators we have used Jan. as the baseline

the average short term interest rate at time t and

common firmagraphic variables, such as the si e of the business.

The model is at the customer level, and the time unit we have used is month. For the baseline ha ard functions, we construct one for the first cycle first product to second product , one for the second cycle, and one for all remaining cycles. With the exception of certain historical product ownership and balance information, most variables are assumed to have the same effect across different cycles and products, so as to produce a more parsimonious model. Our results showed that the goodness of fit does not reduce much by this simplification, and out of sample validation tends to be more stable with simpler models. We have tried various polynomial function formulations of ha ard functions, and for this set of data, the ha ard functions are either monotonically decreasing or bathtub shaped, which is consistent with the relative long adoption cycle we have observed in the data. Finally, the individual level random effect is determined to be significant. We have used the SAS procedure NLMI ED for the model building. For models without random effect, standard procedures such as LOGISTTIC or CADMOD can be used.

T e : o e P r eter r

T e : o e Perfor ce

Page 35: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

33

To protect client confidentiality, we have masked the baseline ha ard function specification and estimation. Also, instead of the entire model, we report only the portion corresponding to a particular kth purchase cycle. But the information provided is structurally cohesive and sufficient for illustration purposes.

The estimated model was used to score another group some , 00 new customers acquired in 00 , observed to have

acquired additional products in 00 . We used the model to forecast, at the end of 00 , what their next product will be and then used 00 data to validate the prediction. Table cross tabulates the actual next product purchased, given the forecasted next product, where the forecasted next product is the one with the highest predicted probability for each customer. For example, of the 3 customers forecasted to purchase Business DDA,

actually purchased this product. In all categories, the model correctly forecasted the most popular cross sell products these customers adopted in 00 (See Tables 1 & Two, page 32).

One limitation of the model is correctly forecasting less popular products, such as Business Savings in our case study, which had a lower predicted probability compared to other products. This is not unique to our model and can be partially overcome when incorporating the product margin into the optimi ation framework.

The other limitation of the model, we recogni e, is that the marketing optimi ation is on the next customer purchase and not over the customer s lifetime purchases. Even though the product offer choice at any particular campaign occasion is to maximi e the incremental value for the next year, it is still a greedy algorithm approach of a local optimi ation. In the case of financial services, where the overall rate of crosssell purchases is somewhat lower, this is less of an issue. To account for the stochastic nature of customer behavior and the desire to optimi e longitudinally, a Markov Decision Process may be the required framework, and research in this area is ongoing and we expect this to be one of the future approaches available to marketing professionals.

In Section 3, we introduced an optimi ation framework that can extend the predictive model and provide a practical method for implementation. To conclude this section, we provide a brief example of how to perform optimi ation using the predictions of the model along with other business inputs and metrics.

The decision as to which product to promote and to whom is a

function not only of the likelihood of adoption, but also of the relative margin and other business factors. Table 3 (below) has a simple example of three products to promote to a particular customer, given the customer s probability of adopting the product during a specific time period, the product margins, and the sub ective business weights of each product. The highest multiplication of these values results in the decision to promote product C to the customer, even though product B may be more likely to be adopted.

We conclude this section by providing an illustrative example on a portfolio level optimi ation. The unit of focus in our example is customer segment, which allows us to derive a parsimonious ob ective function and take advantage of any existing relevant segment level metrics, such as product margins or communication costs. Most importantly, this results in an optimi ation model that is amendable to simple and quick computational solutions.

Let ns denote the number of customers in segment s, where

To keep the notation simple, we will only consider two product choices checking and other products. For a customer in segment s, let PsCHK and PsOTH denote the probabilities of opening a checking account and other products, respectively. Let MsCHK be the average profit margin of checking from a customer in segment s, and MsOTH be of other products. Since our model provides product adoption probability at individual customer level, we can compute PsCHK as the average of such probabilities from customers in segment s.

Assume that the marketing campaign has a budget of B, at least half of which must be allocated to promote the checking product. The cost of mailing per customer is denoted by C. The ob ective function is therefore expected profit, maximi ed over the choice of which products to promote to which segments. XsCHK and XsOTH are the binary decision variables indicating

T e

Page 36: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

SPRING 2007 DMA ANALYTICS ANNUAL JOURNAL

34

the product offer for different segments. Mathematically the ob ective function is

When the number of segments is manageable, the above integer programming problem can be readily solved in Excel. For larger scale problems, more sophisticated solvers or other algorithmic heuristics can be gainfully employed to derive close to optimal solutions.

o c io Di c io

Given the increasing recognition of cross selling as a critical and integral part of a company s CRM strategy, we expect marketers to seek more sophisticated modeling and optimi ation tools to improve the effectiveness of their cross selling operations. In this paper, we have presented a comprehensive and exible cross selling optimi ation framework. To support this framework, we have proposed an innovative and powerful statistical model to estimate a customer s product and timing preferences. The model can be calibrated with standard statistical software. The resulting model provides insights in terms of inter product dependency and provides customer level prediction of the next best product at any future time point of interest, which fits nicely with a customer centric marketing orientation. We recogni e that there are limitations with this approach, but believe that it provides a significant step forward, compared to current, widespread practices. We hope our article will stimulate more discussion and inspire new and improved ideas. ■

efere ce

Kumar, V. and W. J. Reinart , Customer Relationship Management A Database Approach, John Wiley & Sons, Inc., 00

Russell, G. J. and A. Petersen, Analysis of Cross Category Dependence in Market Basket Selection, Journal of Retailing, Vol. , 3 3 , 000

3 Hruschka, H., M. Lukanowic and C. Buchta, Cross category Sales Promotion Effects, Journal of Retailing and Consumer Services, Vol. ,

0 ,

Vilcassim, N. J. and D.C. Jain, Modeling Purchase Timing and Brand Switching Behavior Incorporating Explanatory Variables and nobserved Heterogeneity, Journal of Marketing Research, Vol. , ,

Kamakura, W., A. Ansari, A. Bodapati, et al., Choice Models and Customer Relationship Management, Marketing Letters, Vol. , ,

00

Paas, L. and I.W. Molenaar, Analysis of Acquisition Patterns A Theoretical and Empirical Evaluation of Alternative Methods, International Journal of Research in Marketing, Vol. , 00, 00

Paas, L. and T. Kui len, Acquisition Pattern Analyses for Recogni ing Cross Sell Opportunities in the Financial Service Sector, Journal of Targeting, Measurement and Analysis for Marketing, Vol. , 30 0, 000

Li, S., B. Sun and R. T. Wilcox, Cross Selling Sequentially Ordered Products An Application to Consumer Banking Services, Journal of Marketing Research, Vol. , 33 3 , 00

Prin ie, A. and D. Van den Poel, Investigating Purchasing Sequence Patterns for Financial Services sing Markov, MTD and MTDg Models, European Journal of Operational Research, Vol. 0, 0 3 , 00

0 Knott, A., A. Hayes and S.A. Neslin, Next Product To Buy Models for Cross Selling Applications, Journal of Interactive Marketing, Vol. , ,

00

Kuo, L. and Z. Chen, A State Duration Model for Brand Choice and Inter Purchase Time, Journal of Data Science, Vol. , , 00

Kumar, V., R. Venkatesan and W. Reinart , Knowing What to Sell, When, and to Whom, Harvard Business Review, Vol. , Issue 3, 3 3 ,

00

3 Harrison, T. and J. Ansell, Customer Retention in the Insurance Industry sing Survival Analysis to Predict Cross Selling Opportunities, Journal of Financial Service Marketing, Vol. , 3 , 00

Blattberg, R. C., G. Get and J.S. Thomas, Customer Equity Building and Managing Relationships As Valuable Assets, Harvard Business School Press, July 00

Iwata, T., K. Saito, T. Yamada, Recommendation Method for Extending Subscription Periods, KDD 2006, , 00

Helsen, K. and D. Schmittlein, Analy ing Duration Times in Marketing Evidence for the Effectiveness of Ha ard Rate Models, Marketing Science, Vol. , 3 , 3

Page 37: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

Do you leverage analytics to drive return on T T T?

Then our analytics focused D e i r are the perfect training opportunity for you.

Statistics & Modeling for Direct Marketers Seminar

Get up-to-speed on essential direct marketing statistical techniques that will enable you to amplify its power. This seminar has been around longer than any other DMA course. Since 1986, hundreds of companies have sent employees to this powerful seminar. Many think of it as their own internal training course that prepares staff to make informed decisions and better manage modeling projects.

YOU’LL LEARN: When and where to use: multiple and/or logistic regression and discriminant, factor, and cluster analyses to:• Increase response• Improve conversions• Manage retention• Increase sales• Segment your customer file

www.dmastatistics.org

Database Marketing Seminar

Gain a marketing-based perspective of the importance of Online Analytical Processing and advanced analytic techniques – including segmentation and predictive modeling. Get up-to-speed on how to effectively integrate your marketing efforts with information technology, quantitative analysis, finance, and merchandise functions in your organization to multiply your ROI.

YOU’LL LEARN:• The tremendous cost of NOT doing database

marketing • How to apply predictive modeling to boost

marketing ROI• How to discuss database marketing issues confidently

with technical professionals • How to increase profits from each customer in

your database

www.dmadatabase.org

DMA’s Distinguished Database Marketing Seminar is Now Offered Online

DMA’s new Database Marketing Online Seminar features the same powerful information presented in our in-person seminar. Enjoy the benefits of learning this valuable material at your own pace — when it’s most convenient for you!

www.dmadatabaseonline.org

NEW!

Page 38: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

Por more informationiane aminskas

212 76 7277 1 31

om ete form with payment to

A Ana tics ounciirect arketin Association

1120 Avenue of the Americas New York, NY 10036 6700

212 76 6 ease o N mai a u icate a ication

or more information kaminskas the ma or

oi To

Direct r eti oci tio c

1120 Avenue of the Americas New York, NY 10036 6700 212 76 7277www.the-dma.org/councils

D tic o ci

P D P D D

Name

it e

om an

it tate i

hone a

mai

eferre

❑ Yes, m or ani ation com an is a A mem er

ne sentence escri tion of our osition at the com an

A A Ana tics ounci mem ers must e em o e a A mem er com an or A mem ershi information, ease ca 212 76 7277, e t 11 em ershi e ins u on recei t of a ment

AY N

❑ heck is enc ose in the amount of 1

har e m

❑ Ame ❑ aster ar ❑ A ❑ iscover ar ❑

Num er ate

ar ho er s Name ease rint

ar ho er s i nature

h P ore for o ci e tem ers receive iscount ricin an riorit re istration

Page 39: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,
Page 40: 2007 DRECT MARKETNG ANALPTCS JOURNAL · A Case Study from the Financial Sector 3 ... Director, CRM Modeling, Starwood Hotels & Resorts Worldwide MEMBERSHP CHAR Leo Kluger Senior Manager,

30% Recycled Content, Forest Service Certified.

1120 Avenue of the AmericasNew York, NY 10036–6700

The Power of Direct:Relevance. Responsibility. Results.