41
O P E N D A T A S C I E N C E C O N F E R E N C E_ BOSTON 2015 @opendatasci Victor S.Y. Lo May, 2015 Machine Learning Based Personalization Using Uplift Analytics: Examples and Applications Uplift Modeling Workshop

Uplift Modeling Workshop

  • Upload
    odsc

  • View
    254

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Uplift Modeling Workshop

O P E ND A T AS C I E N C EC O N F E R E N C E_ BOSTON

2015

@opendatasci

Victor S.Y. Lo

May, 2015

Machine Learning Based Personalization Using Uplift Analytics: Examples and Applications

Uplift Modeling Workshop

Page 2: Uplift Modeling Workshop

2

Outline Why do we need Uplift modeling? 10 min

Various methods for Uplift modeling 30 min

Break 5 min

Direct response vs. Uplift modeling 10 min

Prescriptive Analytics for Multiple Treatments 20 min

Q&A 10 min

Page 3: Uplift Modeling Workshop

3

Disclaimer:

This presentation does not represent the views or opinions of Fidelity Investments

Page 4: Uplift Modeling Workshop

4

0%

2%

4%

6%

8%

10%

1 2 3 4 5 6 7 8 9 10

Decile

Response rate

Average2.5%

Top decile lift (over random) = 4 timesTop 3 deciles lift = 2.6 times

Big LiftModelers: VERY SUCCESSFUL MODEL!

Response Modeling

Page 5: Uplift Modeling Workshop

5

Top 3 Deciles RandomTreatment 6.7% 2.5%Control 6.7% 2.5%Lift 0.0% 0.0%

Campaign Results

No LiftMarketers: VERY DISAPPOINTING!

Modelers: Not my problem, it is the mail design!

Page 6: Uplift Modeling Workshop

6

So, Who is Right?

Page 7: Uplift Modeling Workshop

7

A successful response model

1 2 3 4 5 6 7 8 9 10

A successful marketing campaign

What’s wrong with this picture?

Test 1 Test 2 Total

3.3%

2.7%3.0%

2.3%

1.7%2.0%

Treatment Response RateControl Response Rate

14%

7%

4%2% 1% 1% 0% 0% 0% 0%

1 2 3 4 5 6 7 8 9 10Decile

Incidence of Treatment Responders

0%

50%

100%

0% 50% 100%Pct o

f Tre

atm

ent R

espo

nder

s

Pct of Treatment Group

CUME Pct of Responders Random

DM LIFT? DM LIFT?

Page 8: Uplift Modeling Workshop

8

Motivation

Based on the following campaign result, which of the customer groups is the best for future targeting ?

Treatment Control Difference<35 0.5% 0.2% 0.3%35-60 2.5% 0.5% 2.0%>60 3.5% 2.5% 1.0%

Age

Response Rate By Age and Treatment/Control

• >60 has the highest response rate – treatment-only focus (common practice)

• 35-60 has the highest Lift (most positively influenced by the treatment)

Page 9: Uplift Modeling Workshop

9

Framework for Causal and Association Analysis

Causal Inference

(Lift Analysis, Average Treatment

Effect)

Uplift Modeling(Heterogeneous Treatment Effect,

Effect Modification)

Reporting / Summary Statistics

Response Modeling / Propensity Modeling

Population /Sub-population

Personalized

Fro

m A

ssoc

iatio

n to

Cau

salit

y

Granularity

Page 10: Uplift Modeling Workshop

10

1 2 3 4 5 6 7 8 9 10Decile

Treatment "Responders"

Control "Responders"

1 2 3 4 5 6 7 8 9 10Decile

Treatment "Responders"

Control "Responders"

The Uplift Model Objective Maximize the Treatment responders while minimizing

the control “responders”

True lift

True lift

A standard response model A uplift response model (Ideal)

Hypothetical data

Page 11: Uplift Modeling Workshop

11

Traditional Approach Uplift Modeling

Uplift Approaches

Previous campaign data

Control Treatment

Training data set

Holdoutdata set

Model

Previous campaign data

Control Treatment

Training data set

Holdoutdata set

Model Source: Lo (2002)

Page 12: Uplift Modeling Workshop

12

Uplift model solutions

0. Baseline results: Standard response model – treatment-only (as a benchmark)

1. Two Model Approach: Take difference of two models, Treatment Minus Control

2. Treatment Dummy Approach: Single combined model using treatment interactions

3. Four Quadrant Method

Page 13: Uplift Modeling Workshop

13

Method 1: Two Model Approach: Treatment - Control Model 1 predicts P(R | Treatment)

Model Sample = Treatment Group Model 2 predicts P(R | no Treatment)

Model Sample = Control Group

Final prediction of lift =

Treatment Response Score – Control Response Score

Pros: simple concept, familiar execution (x2) Cons: indirectly models uplift, the difference may be only noise, 2x

the work, scales may not be comparable, 2x the error, variable reduction done on indirect dependent vars

P(R | Treatment) P(R | no Treatment)

Page 14: Uplift Modeling Workshop

14

Method 2: Treatment Dummy Approach, Lo (2002)

1. Estimate both E(Yi|Xi;treatment) and E(Yi|Xi;control) and use a dummy T to differentiate between treatment and control:Linear logistic regression:

2. Predict the lift value (treatment minus control) for each individual:

)iTiXδ'iγTiXβ'exp(α1

)iTiXδ'iγTiXβ'exp(α)iX|iE(YiP

)i

Xβ'exp(α1

)i

Xβ'exp(α

)i

Xδ'i

Xβ'γexp(α1

)i

Xδ'i

Xβ'γexp(α

control|iPtreatment|iP i

Lift

Pros: simple concept, tests for presence of interaction effects Cons: multicollinearity issues

Page 15: Uplift Modeling Workshop

15

Method 3: Four Quadrant Method Model predicts probability of being in one

of four categories Dependent variable outcome (nominal)

= TR, CR, TN, or CN Model Population = Treatment &

Control groups together

Prediction of lift:

Pros: only one model required; more “success cases” to model after

Cons: not that intuitive…

Response

Yes No

Treatment

Yes TR TN

No CR CN

𝒁 (𝒙 )=𝟏𝟐

[𝑷 (𝑻𝑹|𝒙 )𝑷 (𝑻 )

+𝑷 (𝑪𝑵|𝒙 )𝑷 (𝑪 )

−𝑷 (𝑻𝑵|𝒙 )𝑷 (𝑻 )

−𝑷 (𝑪𝑹|𝒙 )𝑷 (𝑪 )

]

Lai (2006) generalized by Kane, Lo, Zheng (2014)

Page 16: Uplift Modeling Workshop

16

Gini and Top 15% Gini in Holdout Sample

Source: Kane, Lo, and Zheng (2014)

Page 17: Uplift Modeling Workshop

17

Simulated Example: Charity Donation 80-20% split between treatment and control

Randomly split into training (300K) and holdout (200K)

Predictors available: Age of donor

Frequency – # times a donation was made in the past

Spent – average $ donation in the past

Recency – year of the last donation

Income

Wealth

Page 18: Uplift Modeling Workshop

18

Holdout Sample PerformanceLift Chart on Simulated Data

Theoretical model: Two logistics for treatment and control

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Baseline Two model Lo (2002) Four Quadrant (KLZ) Random

Page 19: Uplift Modeling Workshop

19

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

Baseline Random

Two Model Approach Treatment Dummy Approach

Four Quadrant (KLZ)

Gains Chart on Simulated Data

Gini Gini 15% Gini repeatability (R^2)Baseline 5.6420 0.5412 0.7311Method 1: Two Model approach 6.0384 0.7779 0.7830Method 2: Lo(2002), Treatment Dummy 6.0353 0.7766 0.7836Method 3: Four Quadrant Method (or KLZ) 5.9063 0.7484 0.7884

Page 20: Uplift Modeling Workshop

20

Online Merchandise Data From blog.minethatdata.com, with women’s merchandise

online visit as response

50-50% split between treatment and control (43K in total)

Randomly split into training (70%) and holdout (30%)

Predictors available:• Recency • Dollar spent last year• Merchandise purchased last year (men’s, women’s, both)• Urban, suburban, or rural • Channel – web, phone, or both for purchase last year

Page 21: Uplift Modeling Workshop

21

-0.06-0.04-0.02

00.020.040.060.08

0.10.12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Baseline Lo(2002) trt dummy

Two model approach Four Quadrant (KLZ)

Random

Holdout Sample PerformanceLift Chart on Email Online Merchandise Data

Page 22: Uplift Modeling Workshop

22

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

Baseline Random

Two Model Approach Treatment Dummy Approach

Four Quadrant (KLZ)

Gains Chart on Email Online Merchandise Data

Gini Gini 15% Gini repeatability (R^2)Baseline 1.8556 -0.0240 0.2071Method 1: Two Model approach 2.0074 0.0786 0.2941Method 2: Lo(2002), Treatment Dummy 2.4392 0.0431 0.2945Method 3: Four Quadrant Method (or KLZ) 2.3703 0.2288 0.3290

Page 23: Uplift Modeling Workshop

23

Ideal Conditions for Uplift Modeling

A randomized control group is withheld!

Treatment does not cause all “responses,” i.e. control response rate > 0

Natural Response is not highly correlated to Lift

Lift Signal-to-Noise ratio (Lift/control rate) is large enough

Page 24: Uplift Modeling Workshop

24

Case I: Direct Response versus Uplift

Page 25: Uplift Modeling Workshop

25

Direct Response vs. Uplift Modeling

• Retailer couponing

• E-mail click-through

Page 26: Uplift Modeling Workshop

26

Any Customer

Treatment (T)

Direct Response (D)

Response (R)

No Direct Response

(Dc)

Response (R) No Response (N)

Control (C)

Response (R) No Response (N)

Decision Tree of Campaign and Customers

Page 27: Uplift Modeling Workshop

27

Any Customer

Treatment (T)

Direct Response (D)

Response (R)

No Direct Response

(Dc)

Response (R) No Response (N)

Control (C)

Response (R) No Response (N)

Decision Tree of Campaign and Customers

𝑃 (𝐷|𝑇 ,𝑥 ) 1 −𝑃 (𝐷|𝑇 ,𝑥 )

𝐿𝑖𝑓𝑡 (𝑥)=𝑃 (𝐷|𝑇 ,𝑥 )+𝑃 ¿𝑃 (𝑅|𝑇 ,𝑥 )

Page 28: Uplift Modeling Workshop

28

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Lift in Response

Rate

Semi-decile

direct response only uplift direct response + uplift Baseline Random

Holdout Sample Validation of Simulated Data

Page 29: Uplift Modeling Workshop

29

Story of A Mathematician

Page 30: Uplift Modeling Workshop

30

Case II:

Optimization of Multiple Treatments - From Predictive Analytics to Prescriptive Analytics

Page 31: Uplift Modeling Workshop

31

A (or B)Improving Targeting

No Targeting,Single Treatment: A (or B)

Individual LevelTargeting - Model-based

No Targeting,Single Best Treatmentfor all individuals

Improving Treatment

Best of A and B

A

B

2) Target Selection4) Optimal Treatment for Each Individual

3) One Size Fits All1) Random Targeting

From Random Selection to Optimization

Page 32: Uplift Modeling Workshop

32

Subject to:

where = estimated lift value for individual i and treatment j, (decision variable) = 1 if treatment j is assigned to individual i and 0 otherwise; and = cost of promoting treatment j to i.

Integer Program Formulation

E.g., size of target population = 30 million, # treatment combinations = 10,

then # decision variables = 300 millions, and total # possible combinations without constraints = 2300,000,000!

Page 33: Uplift Modeling Workshop

33

A Heuristic Algorithm

1. Perform cluster analysis of the m model-based lift scores in the holdout sample

2. Compute cluster-level lift score for each treatment, using sample mean differences

3. Apply cluster solution to new data (for a future marketing program)

4. Solve a linear programming model to optimize treatment assignment at the cluster-level

Source: Lo and Pachamanova (2015)

Page 34: Uplift Modeling Workshop

34

Subject to:

= # individuals in cluster c to receive treatment j, = cost of treatment j for each individual.

Becomes A Much Simpler Optimization Problem

Can be solved by Excel Solver

Page 35: Uplift Modeling Workshop

35

Online Retail Example

Goal: Optimization of men’s and women’s merchandise

A 10-cluster solution

Page 36: Uplift Modeling Workshop

36

Page 37: Uplift Modeling Workshop

37

CLUSTER

Cluster Size in New Data

Obs. Lift in

response: Men's

Obs. Lift in

response: Women's

Cost per

treatment ($)

Decision var on

number of men's

Decision var on

number of

women's

Total number

of treated

by cluster

Overa 0.07408 0.0438631 4,180 0.1587 0.0224 1 4,180 - 4,180

2 5,650 0.0652 -0.0055 1 - - -

4 60,220 0.0658 0.0628 1 2,340 - 2,340

5 12,370 0.1290 0.0618 1 12,370 - 12,370

6 8,940 0.0672 0.0760 1 - 8,940 8,940

7 29,240 0.0519 0.0213 1 - - -

8 28,070 0.0868 0.0254 1 28,070 - 28,070

9 4,100 0.2249 0.0239 1 4,100 - 4,100

10 37,080 0.0572 0.0426 1 - - - Total 189,850 obj value 5,773 680 6,453

cost $51,060 $ 8,940 $60,000 Budget 60,000$

Linear Programming Solution from Excel Solver

Page 38: Uplift Modeling Workshop

38

Stochastic Optimization

Lift estimates can have high degree of uncertainty, stochastic optimization solutions take the uncertainty into account:

Stochastic Programming

Robust Optimization

Mean Variance Optimization

Page 39: Uplift Modeling Workshop

39

Mean Variance Optimization Example

Page 40: Uplift Modeling Workshop

40

Conclusion

• Uplift is a very impactful emerging subfield• Deserves more R&D

• Extensions are plenty (Lo (2008)):• Multiple treatments • Optimization• Non-randomized experiments• Direct tracking

• Applications in other fields• E.g. Potter (2013), Yong (2015)

Page 41: Uplift Modeling Workshop

41

ReferencesCai, T., Tian, L., Wong, P., and Wei, L.J. (2011), “Analysis of Randomized Comparative Clinical Trial Data for Personalized Treatment,” Biostatistics, 12:2, p.270-282,

Collins, F.S. (2010), The Language of Life: DNA and the Revolution in Personalized Medicine, HarperCollins.

Conrady, S. and Jouffe, L. (2011). “Causal Inference and Direct Effects,” Bayesia and Conrady Applied Science, at http://www.conradyscience.com/index.php/causality

Freedman, D. (2010). Statistical Methods and Causal Inference. Cambridge.

Hamburg, M.A. and Collins, F.S. (2010). “The path to personalized medicine.” The New England Journal of Medicine, 363;4, p.301-304.

Haughton, D. and Haughton, J. (2011). Living Standards Analytics, Springer.

Holland, C. (2005). Breakthrough Business Results with MVT, Wiley.

Kane, K., Lo, V.S.Y., and Zheng, J. (2014) “Mining for the Truly Responsive Customers and Prospects Using True-Lift Modeling: Comparison of New and Existing Methods.” Journal of Marketing Analytics, v.2, Issue 4, p.218-238.

Lai, Lilly Y.-T. (2006) Influential Marketing: A New Direct Marketing Strategy Addressing the Existence of Voluntary Buyers. Master of Science thesis, Simon Fraser University School of Computing Science, Burnaby, BC, Canada.

Lo, V.S.Y. (2002) “The True Lift Model – A Novel Data Mining Approach to Response Modeling in Database Marketing.” SIGKDD Explorations 4, Issue 2, p.78-86, at: http://www.acm.org/sigs/sigkdd/explorations/issues/4-2-2002-12/lo.pdf

Lo, V.S.Y. (2008), “New Opportunities in Marketing Data Mining," in Encyclopedia of Data Warehousing and Mining, Wang (2008) ed., 2nd edition, Idea Group Publishing.

Lo, V.S.Y. and D. Pachamanova (2015), “A Practical Approach to Treatment Optimization While Accounting for Estimation Risk,” Technical Report.

McKinney, R.E. et al. (1998),”A randomized study of combined zidovudine-lamivudine versus didanosine monotherapy in children with sympotomatic therapy-naïve HIV-1 infection,” J. of Pediatrics,133, no.4, p.500-508.

Mehr, I.J. (2000), “Pharmacogenomics and Industry Change,” Applied Clinical Trials, 9, no.5, p.34,36.

Morgan, S.L. and Winship C. (2007). Counterfactuals and Causal Inference. Cambridge University Press.

Pearl, J. (2000), Causality. Cambridge University Press.

Potter, Daniel (2013) Pinpointing the Persuadables: Convincing the Right Voters to Support Barack Obama. Presented at Predictive Analytics World; Oct, Boston, MA; http://www.predictiveanalyticsworld.com/patimes/pinpointing-the-persuadables-convincing-the-right-voters-to-support-barack-obama/ (available with free subscription).

Radcliffe, N.J. and Surry, P. (1999). “Differential response analysis: modeling true response by isolating the effect of a single action,” Proceedings of Credit Scoring and Credit Control VI, Credit Research Centre, U. of Edinburgh Management School.

Radcliffe, N.J. (2007). “Using Control Groups to Target on Predicted Lift,” DMA Analytics Annual Journal, Spring, p.14-21.

Robins, J.M. and Hernan, M.A. (2009), “Estimation of the Causal Effects of Time-Varying Exposures,” In Fitzmaurice G., Davidian, M,, Verbeke, G., and Molenberghs, G. eds. (2009) Longitudinal Data Analysis, Chapman & Hall/CRC, p.553 – 399.

Rosenbaum, P.R. (2002), Observational Studies. Springer.

Rosenbaum, P.R. (2010), Design of Observational Studies. Springer.

Rubin, D.B. (2006), Matched Sampling for Causal Effects. Cambridge University Press.

Rubin, D.B. (2008), “For Objective Causal Inference, Design Trumps Analysis,” The Annals of Applied Statistics, p.808-840.

Rubin, D.B. and Waterman, R.P. (2006), “Estimating the Causal Effects of Marketing Interventions Using Propensity Score Methodology,” Statistical Science, p.206-222.

Russek-Cohen, E. and Simon, R.M. (1997), “Evaluating treatments when a gender by treatment interaction may exist,” Statistics in Medicine, 16, issue 4, p.455-464.

Signorovitch, J. (2007), “Estimation and Evaluation of Regression for Patient-Specific Efficacy,” Harvard School of Public Health working paper.

Spirtes, P., Glymour, C., and Scheines, R. (2000). Causation, Prediction, and Search, 2nd edition, MIT Press.

Wikipedia (2010), “Uplift Modeling,” at http://en.wikipedia.org/wiki/Uplift_modelling

Yong, Florence H. (2015), “Quantitative Methods for Stratified Medicine,” PhD Dissertation, Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University.