Predictive Analytics for Everyone!Building CART Models using R
Chantal Larose
Assistant Professor of Decision Science (Statistics)School of Business, SUNY New Paltz
DASH Lab WorkshopMarch 8 2017
Why Predictive Analytics?
Sports, healthcare, customer service – the world is full of data!
Fun to pull stories out of a mess of numbers
Examples:
A Logistic Regression Approach to PredictingWho Will Make the NBA Playoffs 1
Data Mining Major League Baseballs Pace of Play Problem 2
More sports applications at:New England Symposium on Statistics in Sports
Saturday, September 23, 2017
1Ryan Elmore, Department of Business Information and Analytics, Daniels School of Business,University of Denver
2Aaron Crowley, Zhuolin He, and Rachael Hageman Blair. Department of Biostatistics, StateUniversity of New York at Buffalo
[email protected] Predictive Analytics for Everyone! Building CART Models using R 1
Why R?
Open source, free to download
Active and helpful community
Appeals to non-programmers: Different user interfaces (e.g. RStudio)allow for point-and-click interface for some tasks
Appeals to programmers: Customizable – program your own functions, etc.
[email protected] Predictive Analytics for Everyone! Building CART Models using R 2
Set-up for the Workshop
Open up RStudio on your laptop (Apps → Other → RStudio)
Go to the Workshop’s website:hawksites.newpaltz.edu/dashlab/predictive-analytics-for-everyone/
Download the Churn data set (.csv file)
Download the Adult data set (.csv file)
Download the Do It Yourself! guide (.R file)
The analyses in this workshop are covered in more detail in Data Mining andPredictive Analytics, Second Edition. Larose & Larose, Wiley, 2015.
[email protected] Predictive Analytics for Everyone! Building CART Models using R 3
Getting Acquainted with R
Open the Do It Yourself! R file.
[email protected] Predictive Analytics for Everyone! Building CART Models using R 4
Getting Acquainted with R
Input the data set:
[email protected] Predictive Analytics for Everyone! Building CART Models using R 5
Getting Acquainted with R
Let’s look at some code:
[email protected] Predictive Analytics for Everyone! Building CART Models using R 6
Getting Acquainted with R
How do we tell R to run the code?
1. Highlight the code and press the Run button
2. Put your cursor on the line you want to run and press Control+Enter(no need to highlight code)
[email protected] Predictive Analytics for Everyone! Building CART Models using R 7
Activity 1: CART Models
We want to predict the value of one variable, using other variables
For our first example:
We want to predict the value of Churn,
i.e. whether or not a customer leaves our company.
We will predict Churn using variables such as:
Day Mins: How many minutes during the day a customer uses their phone3
CustServ Calls: How many times a customer has called customer service
VMail Plan: Whether or not a customer has the voicemail plan
3Data is from when day and evening charges were different
[email protected] Predictive Analytics for Everyone! Building CART Models using R 8
Activity 1: CART Models
Wait – What about regression?
The data may be too messy to meet the normality requirements, evenwith transformations
Regression interpretations get very complex very fast (especially withtransformations)
Data set is too large!At some point, you have so many records that the F and t tests fromregression will come back significant, no matter what the reality of thesituation is
CART models generate easy-to-understand “decision rules” (IF this,THEN that) that make intuitive sense
[email protected] Predictive Analytics for Everyone! Building CART Models using R 9
Activity 1: CART Models - Setup
[email protected] Predictive Analytics for Everyone! Building CART Models using R 10
Activity 1: CART Models - Setup
[email protected] Predictive Analytics for Everyone! Building CART Models using R 11
Activity 1: CART Models
CART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / Total
Day Mins < 264
CustServ Calls < 3.5
Int'l Plan = no
Day Mins < 223
Eve Mins < 260
VMail Plan = yes
Intl Calls >= 2.5
Intl Mins < 13
Day Mins >= 160
Eve Mins >= 142
Day Mins >= 176
Eve Mins >= 212
VMail Plan = yes
Eve Mins < 188
Day Mins < 278
Eve Mins < 144
>= 264
>= 3.5
yes
>= 223
>= 260
no
< 2.5
>= 13
< 160
< 142
< 176
< 212
no
>= 188
>= 278
>= 144
False.2850 / 3333
False.2766 / 3122
False.2642 / 2871
False.2476 / 2604
False.2161 / 2221
False.315 / 383
False.298 / 332
True.34 / 51
False.11 / 11
True.34 / 40
False.166 / 267
False.166 / 216
False.166 / 173
True.43 / 43
True.51 / 51
True.127 / 251
False.111 / 149
False.106 / 130
False.86 / 96
False.20 / 34
False.18 / 18
True.14 / 16
True.14 / 19
True.89 / 102
True.127 / 211
False.47 / 53
True.121 / 158
False.32 / 57
False.21 / 25
True.21 / 32
False.7 / 8
True.20 / 24
True.96 / 101
[email protected] Predictive Analytics for Everyone! Building CART Models using R 12
Activity 1: CART Models
CART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / Total
Day Mins < 264
CustServ Calls < 3.5
Day Mins >= 160
VMail Plan = yes
>= 264
>= 3.5
< 160
no
False.2850 / 3333
False.2766 / 3122
False.2642 / 2871
True.127 / 251
False.111 / 149
True.89 / 102
True.127 / 211
False.47 / 53
True.121 / 158
[email protected] Predictive Analytics for Everyone! Building CART Models using R 13
Activity 1: CART Models
[email protected] Predictive Analytics for Everyone! Building CART Models using R 14
Activity 2: On Your Own!
After you complete the Churn example,go to Line 75 to begin Example 2.
All the code you need is there.Follow the directions and run the code.
Task:After building the CART model,
use the model to find at least two decision rules.State the confidence level of each one.
[email protected] Predictive Analytics for Everyone! Building CART Models using R 15