Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
DATA SCIENCE FOR BUSINESS
Lecture 5. Customer relationship management. Churn prediction. Classification.
School of Data Analysis and Artificial Intelligence Department of Computer Science
Moscow, May 22nd, 2020.
CUSTOMER RELATIONSHIP MANAGEMENTCRM an approach to managing a company's interaction with current and potential customers to improve business relationships with customers, focusing on customer retention and increasing sales
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
Company functions• Marketing• Sales• Customer service support
CUSTOMER RELATIONSHIP MANAGEMENTThe service sector consists of the production of services instead of end products
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
• Competitive landscape, easy to switch• High customer acquisition cost• Repetitive interaction - service• Large customer LTV
Service sector industries• Telecommunication• Healthcare• Education• Financial services (banking, insurance, investment)• Professional services (accounting, legal, consulting)• Transportation• Hospitality and tourism• Mass media• Real estate• Utilities
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
• Propensity to buy new service or product• Propensity to respond to an offer• Propensity to churn
PROPENSITY MODELING Efficient customer communication and interaction
CLASSIFICATION ALGORITHMS
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
Examples of classification algorithms• Logistic regression• Decision trees• KNN k-nearest neighbors• Naïve Bayes• Ensemble methods:• Random forest • Gradient boosting decision trees (xgboost)• Neural networks & DL
Classification algorithms• Binary classification• Multi-class classification
Results:• Class assignments• Probability or score/ranking
LOGISTIC REGRESSION
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
Problem set-up:• Target variable y – binary 0/1 (class) • Predictors (x1,x,2,x3 ) – numerical
DECISION TREE
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
• Decision node – feature/attribute test and rule (numerical or categorical)• Leaf node – outcome (class in classification)
Problem set-up:• Target variable binary class / multi-class • Predictors (x1,x,2,x3 ) – numerical or categorical
RANDOM FOREST
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
Problem set-up:• Target variable binary class / multi-class • Predictors (x1,x,2,x3 ) – numerical or categorical
• Bootsrap sampling of examples • Subsample of features/predictors per tree• Voting for the result
MODEL TRAINING AND TESTINGSame rules apply as for regression algorithms
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
CLASSIFICATION EVALUATION Quality metrics
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
True positivesTP
FalsePositives
FP
FalseNegatives
FN
True negativesTN
Actual
NegativePositive
Predicted
Yes (+)
No (-)
True positive = Predict event and event happens
True negative = Predict event does not happen, nothing happens
False positive = Predict event and event does not happen (false alarm)
False negative = Fail to predict event that does happen (missed alarm)
CLASSIER EVALUATION Confusion matrix
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
True positivesTP
FalsePositives
FP
FalseNegatives
FN
True negativesTN
Actual
NegativePositive
Predicted
Yes (+)
No (-)
Positive50%
Negative50%
Negative10%
Positive90%
Data set 1 Data set 2
“Majority class” classifier ( classify all to positive)
CLASSIFIER EVALUATION Normalized confusion matrix
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
True positivesratetpr
Falsepositives
ratefpr
Falsenegatives
ratefnr
True negativesratetnr
ActualNegativePositive
Predicted
Yes (+)
No (-)
True positivesTP
FalsePositives
FP
FalseNegatives
FN
True negativesTN
NegativePositive
Predicted
Yes (+)
No (-)
P N
Actual
tpr = TP / P = TP /(TP+FN)fpr = FP / N = FP /(FP+TN)
fnr = FN / P = 1 - tprtnr = TN / N = 1 - fpr
Accuracy = (TP +TN)/ (P+N) = tpr*P/(P+N) + (1-fpr)* N/(P+N)
CLASSIFIER EVALUATION ROC space
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
instance is positive, alg - positive:tpr = TP/P = ‘specificity’ = ‘recall’
(% of positive detected)
instance is negative, alg - positive:fpr = FP/N = ‘false alarm rate’ = 1 – ‘sensitivity’
(% of negatives wrongly reacted to)
P NY 0.4 0.08N 0.6 0.92
P NY 0.9 0.6N 0.1 0.4
P NY 0.6 0.6N 0.4 0.4
no positives
all positives
RANKING CLASSIFIERSThreshold on ranking
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
RECEIVER OPERATING CURVE (ROC)
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
RECEIVER OPERATING CURVE (ROC)Classifier quality metric – ROC AUC
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
ROC AUC = area under the ROC
curve
ROC AUC = 1 perfectROC AUC =0.5 random (bad)
CUMULATIVE RESPONSE AND LIFT CURVESRanking classifiers
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
Lift of a classifier represents the advantage it provides over random guessing.
Cumulative response curves plot the percentage of positives correctly classified (tpr = TP/P) , as a function of the percentage
of the instances in the data
CUSTOMER CHURN MODELING
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
COST-SENSITIVE LEARNINGCost-benefit analysis
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
p(Y|p) = tprp(Y|n) = fprp(N|p) = fnrp(N|n) = tnr
p(p) = P/(P+N)p(n) = N/(P+N)
PROFIT CURVESProfit from churn management campaign
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
P NY 0.92 0.14N 0.08 0.86
P N
Y $90 -$10
N $0 $0∑ $44.7
ONE MORE BOOK
School of Data Analysis and Artificial Intelligence Department of Computer Science
.