DM_CRM1.ppt

Machine learning Techniques applied to CRM

PUBLIC: PrUning & BuiLding Integrated Classifier

Presented by Soumen Sengupta

Customer Relationship Management

CRM is a process that manages the relationship

between a company and its customers. Improving Customer Profitability Increases ROI Integrating Data Mining with Marketing Efficient algorithms for prediction

Data Mining applied to CRM

Customer Segmentation Customer Profiling

Customer Acquisition

Cross selling/ Up-selling

Customer Retention

Customer Segmentation and Customer Profiling

Customer Segmentation: a method that allows companies to know who their customers are, how they are different from each other, how they should be treated.

e.g. RFM ( Recency, Frequency, Monetary) Demographic Segmentation Psychographic Segmentation Targeted Segmentation Customer Profiling: describing a customer by his attributes such as age, income, lifestyles etc. Various marketing media applied to various segments.

CRM functions contd.

Customer Acquisition: Acquiring new customers by turning a group of potential customers into actual customers. Customer responses to Market Campaigns are analyzed

Customer Interaction

Data Collection

Data Mining

Customer Retention

Customer Retention: Predictive model built to identify customers who are likely to churn e.g. Attrition in the Cellular Telephone industry

Phone Technology

Customer LifetimeC

C

>2.3 years

Old New

Age

<=2.3 years

[20,0]

[20,0]

<=35 >35

[5,40]

[5,10]

Predicting churn in the telecommunication industry, adapted from [BES97]

C: Churner

NC: Non-churners

Cross Selling

Cross-selling is the process of offering new products to existing customers

Modeling of individual customer behaviors

Scoring the models

Optimizing the scores

Machine Learning Techniques

Decision Tree

Artificial Neural Networks

Bayesian Classifier

Genetic Algorithms

Rule based Analysis and lots more

Decision Tree: An operation overview

1. Select splitting Criteria( Information gain, Gain ratio, Gini Index, Chi Square test)

2. Apply recursive partitioning until all the examples( training data) are classified or attributes are exhausted

3. Pruning the tree

4. Test the tree

Feature Extraction

Test the model

Train the model

Handle over fitting of data

Build phase

Prune Phase

An example of Decision Tree for Credit Screening

Work Class

Income

Capital

Gain

Yes NoCredit

HistoryEducation

NoYes

NoYes

Private firmSelf-employed

not inc

Bachelors ~ Bachelors

>50k <50K

Not GoodGood

Satisfactory

Not Satisfactory

PUBLIC: An efficient decision tree classifier

A Decision tree algorithm that integrates pruning into the building phase Produces trees that are smaller in size Makes it computationally efficient More accurate for larger datasets Splitting Criteria: Information Gain n where info Gain(X) = Info (Tree) - ∑ ( |Sj | / S) * log2 ( | Sj | / S) j=1

S are the subsets for various classes

Pruning

Occam’s Razor: The hypothesis that is simple is usually the bestone.

Pruning is used to avoid over fittingImproves accuracy, speed and memory requirementsProduces a much smaller tree

Constraints Size Inaccuracy (Misclassification)

Pruning the Tree and MDL

Pre-pruning

Stop growing the tree when the size reaches a

number of nodes or the cost limit is reached

Post-pruning

Cross Validation

Pessimistic pruning

Minimum Error based pruning

MDL

MDL applied to Decision Trees

MDL principle: The best tree can represent the classes of records with the fewest number of of bits.

A subtree S is pruned if the cost of encoding the records in S isless than the cost of encoding the subtree and the cost ofencoding the records in each leaf

MDL Costs

Cost of encoding the records

Cost of encoding the tree

Cost of encoding the structure of the tree ( 1bit used to

represent a node(1) or a leaf (0) Cost of encoding each split (Csplit)

Cost of classifying the classes of records in the leaves

Cost of encoding the records

Let there be a set S containing n records and k classes and let ni be the number of records belonging to class i. The cost function C(S) for encoding the records is as follows:

C(S) = Σ ni log(n / ni) + (k-1)/2 * log (n/2) + log л(k/2) / Г(k/2)

Split Costs of a subtree

Cost of encoding the tree rooted at node N: C(S) + 1

Cost of a Subtree with s splits:

2*s +1 + s*loga + Σki=s+2 ni

wher s is the number of splits a is the number of attributes

k is the total number of classes

ni represents the number of records belonging to class i

After each split ↑ 2 + log a

↓ ns+2

1split

2 splits

2 + loga < ns+2

Pruning Algorithms compared

Dataset Diabetes has been used and quoted from [MRA98]

MDL produce a tree that is much smaller than that of other pruning algorithms

It is a touch better with error rates although the execution times might be a little slower

Doesn’t need extra data for pruning

Error Rate Tree Size Execution time

C4.5 24.7 65.5 0.1Pessimistic Pruning

27 100.8 0.2

MDL 24.1 34.8 0.2

Advantages of PUBLIC Easy to interpret

Fast and doesn’t require too much training data

Rules can be generated and ranked according to confidence & support

Doesn’t require extra data for training Avoids over fitting by using MDL for post pruning

Reduces the I/O overhead and improves performance

Improves the accuracy as well

References

[BeST00 ] Alex Berson, Stephen Smith, Kurt Thearling: Building Data Mining Applications for CRM, Mcgraw Hill publication, 2000.

[MRA95] Manish Mehta, Jorma Rissanen, Rakesh Agrawal: MDL based

Decision tree Pruning IBM Almaden Research Center, 1995 [RS98] Rajeev Rastogi, Kyuseok Shim: PUBLIC: A Decision Tree

Classifier that integrates Building and Pruning. [BR01] Catherine Bounsaythip, Esa Rinta-Runsala: Overview of Data

Mining for Customer Behavior Modeling, 2001 [BeS1997] A. Berson and S. J. Smith, Data Warehousing, Data Mining and

OLAP, McGraw Hill, 1997

Questions

Documents

DM_CRM1.ppt