46
Beyond Churn Prediction : An Introduction to UpliModelling Pierre Gutierrez

Beyond Churn Prediction : An Introduction to uplift modeling

Embed Size (px)

Citation preview

Page 1: Beyond Churn Prediction : An Introduction to uplift modeling

Beyond Churn Prediction : An Introduction to Uplift Modelling

Pierre Gutierrez

Page 2: Beyond Churn Prediction : An Introduction to uplift modeling

Plan •  Introduction / Client situation

•  Uplift Use Cases

•  Global Uplift Strategy •  Machine learning for Uplift

•  Uplift Evaluation

•  Conclusion

Material •  Complete project http://gallery.dataiku.com/projects/DKU_UPLIFT/

•  Notebooks & Data https://github.com/PGuti/Uplift.git

Page 3: Beyond Churn Prediction : An Introduction to uplift modeling

Dataiku •  Founded in 2013 •  60 + employees •  Paris, New-York, London, San Francisco

DESIGN

Load and prepare your data

PREPARE Build your

models

MODEL Visualize and share

your work

ANALYSE

Re-execute your workflow at ease

AUTOMATE Follow your production

environment

MONITOR Get predictions

in real time

SCORE PRODUCTION

Data Science Software Editor of Dataiku DSS

Page 4: Beyond Churn Prediction : An Introduction to uplift modeling

Motivations

Page 5: Beyond Churn Prediction : An Introduction to uplift modeling

Client situation •  Client : French Online Gaming Company (MMORPG)

•  Users are leaving (more than 10 years old )

•  let’s do a churn prediction model !

•  Target : no come back in 14 or 28 days. (14 missing days -> 80 % of chance not to come back

28 missing days -> 90 % of chance not to come back) •  Features :

•  Connection features : •  Time played in 1,7,15,30,… days •  Time since last connection •  Connection frequency •  Days of week / hours of days played

•  Equivalent for payments and subscriptions

•  Age, sex, country •  Number of account, is a bot … •  No in game features (no data)

   

Page 6: Beyond Churn Prediction : An Introduction to uplift modeling

Client situation •  Model Results :

•  AUC 0.88 •  Very stable model in time

•  Marketing actions : •  7 different actions based on customer segmentation (offers, promotion, … ) •  A/B test -> -5 % churn for persons contacted by email

•  Going further : •  Feature engineering : guilds, close network, in game actions, … •  Study long term churn …

Page 7: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift Definition •  But wait !

•  Strong hypothesis : target the person that are the most likely to churn

Page 8: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift Definition •  But wait !

•  Strong hypothesis : target the person that are the most likely to churn •  What is the gain / person for an action ?

•  cost of action •  fixed value of the customer •  independent variables •  “treated” population and “control” population

• 

•  Value with action : •  Value without action : •  Gain :

cvi iXT C

Y =

⇢1 if customer churn

0 otherwise

ET (Vi) = vi(1� PT (Y = 1|X))� cEC(Vi) = vi(1� PC(Y = 1|X))

E(Gi) = vi(PC(Y = 1|X)� PT (Y = 1|X))� c

vi(hypothesis  :                      independent  of  ac1on)    

Page 9: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift Definition •  But wait !

•  Strong hypothesis : target the person that are the most likely to churn

•  What is the gain / person for an action ?

•  Real Target : People who are †he most likely to change positively their behavior if there is an action

Upli5  =  Model  

E(Gi) = vi(PC(Y = 1|X)� PT (Y = 1|X))� c

�P

Page 10: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift Definition •  Gain to maximize:

•  Targeting churner: Does not optimize the difference ! Is good if treatment good.

•  Intuitive examples: •  : action is expected to make the situation worst. Spam ? •  : user does not care

E(Gi) = vi(PC(Y = 1|X)� PT (Y = 1|X))� c

PC(Y = 1) ⇡ PT (Y = 1)PC(Y = 1) < PT (Y = 1)

Page 11: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift Definition

If not treated

Positive Response

Negative Response

Unnecessary costs

Negative impact

Positive Response

Negative Response

If treated

Unnecessary costs

The people we want

to target

SURE THINGS

SLEEPING DOGS

PERSUADABLES

LOST CAUSES

Page 12: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift Use Cases

Page 13: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift Use Cases •  Healthcare :

•  Typical medical trial: •  Treatment group: gets the treatment •  Control group: gets placebo (or another treatment)

•  Statistical test show that the treatment works or not globally

•  With uplift modeling we can find out for whom the treatment works best

•  Personalized medicine •  Ex : What is the gain in survival probability ?

-> classification/uplift problem

Page 14: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift Use Cases •  Churn :

•  E-gaming •  Other Ex : Coyote

•  Retail : •  Compare effect of coupons campaigns

•  Marketing / CRM : •  Churn •  E-Mailing

Page 15: Beyond Churn Prediction : An Introduction to uplift modeling

Example •  Mailing : Hillstrom challenge

•  2 campaigns : •  one men email

•  one woman email

•  Question : who are the people to target / that have the best response rate

Page 16: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift VS Causal Inference methods •  Causal inference closer to econometrics

•  Uplift closer to ML, more practical •  Evaluation based on Cross Validation •  Usage of classical ML models •  Sometimes lack of theory

•  Different people who don’t really talk together: •  Different Notations (sorry). Today is uplift’s •  Different evaluation functions •  Different models ?

Not really !

Page 17: Beyond Churn Prediction : An Introduction to uplift modeling

Global Uplift Strategy

Page 18: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift as a natural evolution

Train  Data  

Step 1 : train a (churn) model

Training  

Churn  Model  

Page 19: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift as a natural evolution

Train  Data  

Test  Data  

A/B  test  on  scored  dataset  

Step 2 : A/B test the model

Training  

Churn  Model  

Page 20: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift as a natural evolution

Train  Data  

Test  Data  

A/B  test  on  scored  dataset  

Step 3 : train your uplift model

Training  

Churn  Model  

Training  

Page 21: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift as a natural evolution

Train  Data  

Test  Data  

A/B  test  on  scored  dataset  

New  scoring  

Step 4 : deploy

Training  

Churn  Model  

New  Test  Data  

Upli5  Model  

Training  

Page 22: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift as a natural evolution

Train  Data  

Test  Data  

A/B  test  on  scored  dataset  

New  scoring  

Capitalize on your A/B test data !

Training  

Churn  Model  

New  Test  Data  

Upli5  Model  

Training  

Today’s Focus

Page 23: Beyond Churn Prediction : An Introduction to uplift modeling

Machine learning Model

Page 24: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift modeling •  Three main methods in Uplift Literature:

•  Two models approach

•  Class variable modification

•  Modification of existing machine learning models (tree based methods, out of the scope of today).

•  Generalization: Causal inference approach

•  Main Assumption (Uncofoundedness) : Control and Treatment belonging should be independent of the response

Page 25: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift modeling : Two model approach •  Build a model on treatment to get

•  Build a model on control to get

•  Set :

PT (Y |X)

PC(Y |X)

�P = PT (Y |X)� PC(Y |X)

Page 26: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift modeling : Two model approach •  Advantages :

•  Standard ML models can be used •  In theory, two good estimators -> a good uplift model •  Works well in practice •  Generalize to regression and multi-treatment easily

•  Drawbacks •  Difference of estimators is probably not the best estimator of the difference •  The two classifier can ignore the weaker uplift signal (since it’s not their target) •  Algorithm focusing on estimating the difference should perform better

Page 27: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift modeling : Class variable transformation •  Introduced in Jaskowski, Jaroszewicz 2012 •  Allows any classifier to be updated to uplift modeling

•  Let denote the group membership (Treatment or Control)

•  Let’s define the new target variable :

•  This corresponds to flipping the target in the control dataset.

G 2 {T,C}

Z =

8<

:

1 if G = T and Y = 1

1 if G = C and Y = 0

0 otherwise

Page 28: Beyond Churn Prediction : An Introduction to uplift modeling

•  Why does it work ?

•  By design (A/B test warning !), should be independent from

•  Possibly with a reweighting of the datasets we should have :

thus

P (Z = 1|X) = PT (Y = 1|X)P (G = T |X) + PC(Y = 0|X)P (G = C|X)

P (Z = 1|X) = PT (Y = 1|X)P (G = T ) + PC(Y = 0|X)P (G = C)

G X

P (G = T ) = P (G = C) = 1/2

2P (Z = 1|X) = PT (Y = 1|X) + PC(Y = 0|X)

Uplift modeling : Class variable transformation

Page 29: Beyond Churn Prediction : An Introduction to uplift modeling

•  Why does it work ?

Thus And sorting by is the same as sorting by

2P (Z = 1|X) = PT (Y = 1|X) + PC(Y = 0|X)= PT (Y = 1|X) + 1� PC(Y = 1|X)

�P = 2P (Z = 1|X)� 1

P (Z = 1|X) �P

Uplift modeling : Class variable transformation

Page 30: Beyond Churn Prediction : An Introduction to uplift modeling

•  Summary : •  Flip class for control dataset •  Concatenate test and control dataset •  Build a classifier •  Target users with highest probability

•  Advantages :

•  Any classifier can be used •  Directly predict uplift (and not each class separately) •  Single model on a larger dataset (instead of two small ones)

•  Drawbacks :

•  Complex decision surface -> model can perform poorly

Uplift modeling : Class variable transformation

Page 31: Beyond Churn Prediction : An Introduction to uplift modeling

Generalization : •  From Athey:

Y ? = Y G�e(X)e(X)(1�e(X))Let

•  Any classical estimator can be used •  Generalize to more advanced A/B test schemed •  Specific estimator can be derived (see paper)

With

E(Y ? = �P )Then (Unconfoundedness)

e(X) = P (G = 1|X)

Page 32: Beyond Churn Prediction : An Introduction to uplift modeling

Uplift modeling : Other methods •  Based on decision trees :

•  Rzepakowski Jaroszewicz 2012 new decision tree split criterion based on information theory •  Soltys Rzepakowski Jaroszewicz 2013 Ensemble methods for uplift modeling

(out of today scope )

Page 33: Beyond Churn Prediction : An Introduction to uplift modeling

Model Evaluation

Page 34: Beyond Churn Prediction : An Introduction to uplift modeling

Evaluation •  Problem :

•  We don’t have a clear 0/1 target. •  We would need to know for each customer

•  Response to treatment •  Response to control -> not possible

•  Cross Validation : •  Train and Validation split •  Stratified on target/control variable.

Page 35: Beyond Churn Prediction : An Introduction to uplift modeling

Evaluation: Uplift Decile / Bins •  Uplift bins:

•  Sort dataset by predicted uplift descending •  Calculate uplift per bin

•  Hard to compare models

YT number of positive in treated

YC number of positive in control

NT number in treated

NC number in control

U = YTNT

� YCNC

Page 36: Beyond Churn Prediction : An Introduction to uplift modeling

Evaluation: Uplift Decile / Bins •  Cumulative Uplift bins :

•  Sort dataset by predicted uplift descending •  Calculate uplift on all bins preceding

•  Cumulative Uplift Gain bins : •  Sort dataset by predicted uplift descending •  Calculate uplift on all bins preceding •  Multiply by number of instances

Page 37: Beyond Churn Prediction : An Introduction to uplift modeling

Evaluation: Uplift Curve •  Generalization of the previous curve Parametric curve defined by:

•  Similar to lift / ROC Curve •  Models can be compared ! AUC

Page 38: Beyond Churn Prediction : An Introduction to uplift modeling

Evaluation: Qini •  Introduced in Radcliffe Parametric curve defined by: f(t) = YT (t)� YC(t) ⇤NT (t)/NC(t)

t (observa1ons)  

Page 39: Beyond Churn Prediction : An Introduction to uplift modeling

Evaluation: Qini •  Best model :

•  Take first all positive in target and last all positive in control. •  No theoretic best model :

•  depends on possibility of negative effect •  Displayed for no negative effect

•  Random model : •  Corresponds to global effect of treatment

•  Hillstrom Dataset : •  For women models are comparable and useful •  For men, there is no clear individuals to target

Page 40: Beyond Churn Prediction : An Introduction to uplift modeling

Evaluation: Qini

t (observa1ons)  

Page 41: Beyond Churn Prediction : An Introduction to uplift modeling

Evaluation: Qini

t (observa1ons)  

Page 42: Beyond Churn Prediction : An Introduction to uplift modeling

Conclusion •  Uplift Modeling :

•  Surprisingly little literature / examples •  The theory is rather easy to test

•  Two models •  Class modification

•  The intuition and evaluation are not easy to grasp

•  On the client side : •  A good lead to select the best offer for a customer -> Can lead to more customer personalization

•  Applications : •  Churn, mailing, retail couponing, personalized medicine…

Page 43: Beyond Churn Prediction : An Introduction to uplift modeling

Thank you for your attention !

Page 44: Beyond Churn Prediction : An Introduction to uplift modeling

A few references •  Data :

•  Churn in gaming : WOWAH dataset

•  Uplift for healthcare : Colon Dataset

•  Uplift in mailing : Hillstrom data challenge

•  Uplift in General :

Simulated data : available on gallery.dataiku.com

•  Demo : •  http://gallery.dataiku.com/projects/DKU_UPLIFT/

Page 45: Beyond Churn Prediction : An Introduction to uplift modeling

A few references •  Application

•  Uplift modeling for clinical trial data (Jaskowski, Jaroszewicz) •  Uplift Modeling in Direct Marketing (Rzepakowski, Jaroszewicz)

•  Modeling techniques : •  Rzepakowski Jaroszewicz 2011 (decision trees) •  Soltys Rzepakowski Jaroszewicz 2013 (ensemble for uplift) •  Jaskowski Jaroszewicz 2012 (Class modification model)

•  Evaluation •  Using Control Groups to Target on Predicted Lift (Radcliffe) •  Testing a New Metric for Uplift Models (Mesalles Naranjo)

Page 46: Beyond Churn Prediction : An Introduction to uplift modeling

A few references •  Causal inference

•  Machine Learning Methods for Estimating Heterogeneous Causal Effects (Athey, Imbens 2015) •  Introduction to Causal Inference (Sprites 2010) •  Causal inference in statistics: An overview (Pearl 2009)