Upload
tommy96
View
692
Download
0
Tags:
Embed Size (px)
Citation preview
Machine learning Techniques applied to CRM
PUBLIC: PrUning & BuiLding Integrated Classifier
Presented by Soumen Sengupta
Customer Relationship Management
CRM is a process that manages the relationship
between a company and its customers. Improving Customer Profitability Increases ROI Integrating Data Mining with Marketing Efficient algorithms for prediction
Data Mining applied to CRM
Customer Segmentation Customer Profiling
Customer Acquisition
Cross selling/ Up-selling
Customer Retention
Customer Segmentation and Customer Profiling
Customer Segmentation: a method that allows companies to know who their customers are, how they are different from each other, how they should be treated.
e.g. RFM ( Recency, Frequency, Monetary) Demographic Segmentation Psychographic Segmentation Targeted Segmentation Customer Profiling: describing a customer by his attributes such as age, income, lifestyles etc. Various marketing media applied to various segments.
CRM functions contd.
Customer Acquisition: Acquiring new customers by turning a group of potential customers into actual customers. Customer responses to Market Campaigns are analyzed
Customer Interaction
Data Collection
Data Mining
Customer Retention
Customer Retention: Predictive model built to identify customers who are likely to churn e.g. Attrition in the Cellular Telephone industry
Phone Technology
Customer LifetimeC
C
>2.3 years
Old New
Age
<=2.3 years
[20,0]
[20,0]
<=35 >35
[5,40]
[5,10]
Predicting churn in the telecommunication industry, adapted from [BES97]
C: Churner
NC: Non-churners
Cross Selling
Cross-selling is the process of offering new products to existing customers
Modeling of individual customer behaviors
Scoring the models
Optimizing the scores
Machine Learning Techniques
Decision Tree
Artificial Neural Networks
Bayesian Classifier
Genetic Algorithms
Rule based Analysis and lots more
Decision Tree: An operation overview
1. Select splitting Criteria( Information gain, Gain ratio, Gini Index, Chi Square test)
2. Apply recursive partitioning until all the examples( training data) are classified or attributes are exhausted
3. Pruning the tree
4. Test the tree
Feature Extraction
Test the model
Train the model
Handle over fitting of data
Build phase
Prune Phase
An example of Decision Tree for Credit Screening
Work Class
Income
Capital
Gain
Yes NoCredit
HistoryEducation
NoYes
NoYes
Private firmSelf-employed
not inc
Bachelors ~ Bachelors
>50k <50K
Not GoodGood
Satisfactory
Not Satisfactory
PUBLIC: An efficient decision tree classifier
A Decision tree algorithm that integrates pruning into the building phase Produces trees that are smaller in size Makes it computationally efficient More accurate for larger datasets Splitting Criteria: Information Gain n where info Gain(X) = Info (Tree) - ∑ ( |Sj | / S) * log2 ( | Sj | / S) j=1
S are the subsets for various classes
Pruning
Occam’s Razor: The hypothesis that is simple is usually the bestone.
Pruning is used to avoid over fittingImproves accuracy, speed and memory requirementsProduces a much smaller tree
Constraints Size Inaccuracy (Misclassification)
Pruning the Tree and MDL
Pre-pruning
Stop growing the tree when the size reaches a
number of nodes or the cost limit is reached
Post-pruning
Cross Validation
Pessimistic pruning
Minimum Error based pruning
MDL
MDL applied to Decision Trees
MDL principle: The best tree can represent the classes of records with the fewest number of of bits.
A subtree S is pruned if the cost of encoding the records in S isless than the cost of encoding the subtree and the cost ofencoding the records in each leaf
MDL Costs
Cost of encoding the records
Cost of encoding the tree
Cost of encoding the structure of the tree ( 1bit used to
represent a node(1) or a leaf (0) Cost of encoding each split (Csplit)
Cost of classifying the classes of records in the leaves
Cost of encoding the records
Let there be a set S containing n records and k classes and let ni be the number of records belonging to class i. The cost function C(S) for encoding the records is as follows:
C(S) = Σ ni log(n / ni) + (k-1)/2 * log (n/2) + log л(k/2) / Г(k/2)
Split Costs of a subtree
Cost of encoding the tree rooted at node N: C(S) + 1
Cost of a Subtree with s splits:
2*s +1 + s*loga + Σki=s+2 ni
wher s is the number of splits a is the number of attributes
k is the total number of classes
ni represents the number of records belonging to class i
After each split ↑ 2 + log a
↓ ns+2
1split
2 splits
2 + loga < ns+2
Pruning Algorithms compared
Dataset Diabetes has been used and quoted from [MRA98]
MDL produce a tree that is much smaller than that of other pruning algorithms
It is a touch better with error rates although the execution times might be a little slower
Doesn’t need extra data for pruning
Error Rate Tree Size Execution time
C4.5 24.7 65.5 0.1Pessimistic Pruning
27 100.8 0.2
MDL 24.1 34.8 0.2
Advantages of PUBLIC Easy to interpret
Fast and doesn’t require too much training data
Rules can be generated and ranked according to confidence & support
Doesn’t require extra data for training Avoids over fitting by using MDL for post pruning
Reduces the I/O overhead and improves performance
Improves the accuracy as well
References
[BeST00 ] Alex Berson, Stephen Smith, Kurt Thearling: Building Data Mining Applications for CRM, Mcgraw Hill publication, 2000.
[MRA95] Manish Mehta, Jorma Rissanen, Rakesh Agrawal: MDL based
Decision tree Pruning IBM Almaden Research Center, 1995 [RS98] Rajeev Rastogi, Kyuseok Shim: PUBLIC: A Decision Tree
Classifier that integrates Building and Pruning. [BR01] Catherine Bounsaythip, Esa Rinta-Runsala: Overview of Data
Mining for Customer Behavior Modeling, 2001 [BeS1997] A. Berson and S. J. Smith, Data Warehousing, Data Mining and
OLAP, McGraw Hill, 1997
Questions