20
Marketing Analytics with R Disclaimer: • All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are used here for educational purposes only n Sorger 2013. www.StephanSorger.com ; Marketing Analytics: Analytics with R;

Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

Embed Size (px)

Citation preview

Page 1: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

Marketing Analytics with R

Disclaimer:• All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are used here for educational purposes only

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.1

Page 2: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

Statistical Analysis Software: Introduction

Topic Definition

Definition Software designed for in-depth analysisUnlike MS Excel (general purpose spreadsheet)

Origins SAS conceived in 1966 by Anthony J. BarrPlaced statistical procedures in formatted file framewk

Uses Advanced statistical techniquesNonlinear functions; Multiple regression; Conjoint

Advantages Powerful; Accurate; Specific tools

Disadvantages Command line interface; steep learning curveVery expensive

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.2

Page 3: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

Statistical Analysis Software: Supplier Companies

Topic Definition

Statistical Software SAS: Market leader, especially in Fortune 500SPSS: Strong in education market (IBM)R: Open source Others: StatPac, StatSoft STATISTICA, etc.

Business Intelligence Overall Size: 2013: $13.8B; 2016: $17.1BIBM Cognos (2011: 12.1% of market)Microsoft BI (2011: 8.7% of market)Oracle Hyperion 2011 (2011: 15.6% of market)SAP Business Objects (2011: 23.6% of market)SAS Business Intelligence (2011: 12.6% of market)SPSS Modeler (2011: 0.4% of market)Others: GoodData, Panorama, Tableau, etc.

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.3

Gartner Press Release, “Gartner Says WorldWide Business Intelligence Software Revenue to Grow 7% in 2013.” February 19, 2013. http://www.gartner.com/newsroom/id/2340216SAS Press Release, “SAS in Leaders Quadrant for Business Intelligence Platforms.” February 3, 2010.http://www.sas.com/news/preleases/biplatformsgartnerleader.html

Page 4: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

Statistical Analysis Software: Supplier Companies

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.4

Kalakota, Ravi. PracticalAnalytics.Wordpress.Com.” Gartner Says - BI and Analytics a $12.2B Market.” April 24, 2011.http://practicalanalytics.wordpress.com/2011/04/24/gartner-says-bi-and-analytics-a-10-5-bln-market/

Tableau MicrosoftOracle IBM

SAS

SAPActuate

Panorama

GoodData

Jaspersoft

Challengers Leaders

Niche Visionaries

Gartner Magic QuadrantBusiness IntelligenceApril 2011(Excerpts)

Ability toExecute

Completeness of Vision

Page 5: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

Statistical Analysis Software: Supplier Companies

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.5

Kalakota, Ravi. PracticalAnalytics.Wordpress.Com.” Gartner Says - BI and Analytics a $12.2B Market.” April 24, 2011.http://practicalanalytics.wordpress.com/2011/04/24/gartner-says-bi-and-analytics-a-10-5-bln-market/

ReportingWhatHappened?

Business IntelligenceTechnologiesTDWI Model

Complexity

AnalysisWhy did it happen?

MonitoringWhat’s happening now?

PredictionWhat might happen?

Predictive analytics

Dashboards, Scorecards

OLAP, Visualization tools

Query, reporting, and search tools

Business ValueLow

High

High

Page 6: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

Statistical Analysis Software: Major Suppliers

Criteria SAS SPSS R

Market Fortune 500 Universities UniversitiesFocus Power Ease of use PriceUser Power user Student Price-sensitiveOrigins Industry Education Open SourceLearning Difficult Moderate ModerateCost $86,600/yr+ $16,000/yr+ FreeUI Command Line Point & Click Command LineDatabase 32,768 var. 1 file at a timeGraphics SAS/Graph High quality Different packagesAnalogy Microsoft Apple Linux

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.6

UCLA, Statistical Software Packages Comparison, ats.ucla.edu:http://www.ats.ucla.edu/stat/mult_pkg/compare_packages.htm

MineQuest Business Analytics, “Cost of Licensing WPS 3.0 vs. SAS 9.3.” February 2013.http://www.minequest.com/downloads/Pricing_Comparisons_Between_WPS_and_SAS.pdfIBM SPSS Statistics website, “Buy IBM SPSS Statistics Now”http://www-01.ibm.com/software/analytics/spss/products/statistics/buy-now.html

Page 7: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

R: Introduction

Topic Description

Description Free statistical computing and graphics software packageWidely used among statisticians and data minersIncreased popularity in 2010 - on

History Started in 1993Implementation of the S programming language (1976)S offered interactive alternative to Fortran programsS developed by John M. Chambers of Stanford UniversityR developed by Ross Ihaka and Robert Gentleman“R” from Ross & Robert, as well as play on “S”

Commercial Revolution Analytics offers enterprise version ($)

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.7

References:1. Venables, W.N., Smith, D.M., “An Introduction to R.” Version 3.0.1. May 16, 2013.http://www.cran.r-project.org/doc/manuals/R-intro.pdf

Page 8: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

R: Introduction

Topic Description

Features Variety of statistical and graphical techniquesDistributed through GNU GPL (General Public License)GNU: Gnu’s Not Unix; Recursive acronym

Advantages FreePowerfulExtensible through functions and extensionsR community noted for its active contributionsDifferent graphical user interfaces (GUIs) available

Disadvantages Can be slow and memory-hungryUses command line interpreter; No native GUI

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.8

References:1. Venables, W.N., Smith, D.M., “An Introduction to R.” Version 3.0.1. May 16, 2013.http://www.cran.r-project.org/doc/manuals/R-intro.pdf

Page 9: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

R: Basics

Topic Description

Commands Based on UNIX; case sensitiveCommands separated by “;” or by newlineCompound expression in braces: “{ and }”Comments designated by hashtag: #Comment

Data Structure Vector Assignment:> x <- c (1, 2, 3, 4, 5.8)> : Prompt at beginning of line<- : Assignment operatorc() : Function c

Class “Numeric”; “Logical”; “Character”; “List”

Reading Data “read.table()” function> HousePrice <- read.table(“houses.data”)

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.9

References:1. Venables, W.N., Smith, D.M., “An Introduction to R.” Version 3.0.1. May 16, 2013.http://www.cran.r-project.org/doc/manuals/R-intro.pdf

Page 10: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

R: Basics

Topic Description

Class “Numeric”; “Logical”; “Character”; “List”

Reading Data “read.table()” function> HousePrice <- read.table(“houses.data”)

Function R features a rich set of functionsStatistics functions: mean(x); median(x); range(x); etc.Arithmetic functions: 4^2; log (10); sqrt (16)

Plots > hist(x)# generates a default histogram> plot(x,y)# generates a quick x-y plot> quartz(height=4, width=10)# make a wide window

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.10

References:1. Venables, W.N., Smith, D.M., “An Introduction to R.” Version 3.0.1. May 16, 2013.http://www.cran.r-project.org/doc/manuals/R-intro.pdf

Page 11: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

R: Getting Started

Topic Description

Download R Windows:http://cran.r-project.org/bin/windows/base/Mac:http://cran.r-project.org/bin/macosx/

Launch R Double-click to launchWill see prompt in “R Console”>

New Script Select File > New ScriptEditor will openArrange Editor window on left; Console on right

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.11

Untitled—R Editor R Console

> |

Page 12: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

R: Getting Started

Topic Description

Enter Vector <- = “Equal to”; [<- looks like arrow]Example: vector<-c(2, 4, 6, 8)

Run Line Execute (run) lineHighlight line on R editor; Click on “Run Line” icon;3rd from leftWill see “vector” entered in console

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.12

Open ScriptSave Script

Run LineReturn focus to Console

Print

Untitled—R Editor

vector<-c(2,4,6,8)

R Console

> vector<-c(2,4,6,8)

RGui Icons

Page 13: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

R: Getting Started

Topic Description

Statistics Find statisticsmean(vector) <RUN LINE> (mean)var(vector) <RUN LINE> (variance)sd(vector) <RUN LINE> (standard deviation)

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.13

Untitled—R Editor

vector<-c(2,4,6,8)mean(vector)var(vector)sd(vector)

R Console

> vector<-c(2,4,6,8)> mean(vector)[1] 5> var(vector)[1] 6.6667> sd(vector)[1] 2.5819

Page 14: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

R: Getting Started

Topic Description

Directory Load data file to R; typically enter as CSV

CSV File Comma-Separated Values; “Save As” CSV in Excel

Example Datafile.csvA, B, C, D (identifiers)1, 2, 3, 4 (data for observation #1)2, 4, 6, 8 (data for observation #2)…

Load Data Drag csv file and drop into R ConsoleR will show filepath: “C:\\My Documents\\R Files\\...”Type filename and read command into R Editor Example<-read.csv(“C:\\My Documents…”, header=T); Run

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.14

Untitled—R Editor

Datafile<-read.csv(“C:\\My ..”,header=T)

R Console

> load(“C:\\My Documents\\...”

Page 15: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

R: Getting Started

Topic Description

Directory Alternative approach: Set up working directory for datasetWorking directory allows shorter filepathsWindows: See “Windows Explorer help” for more infoMac: See “Finder help” for more info

Data Structured dataset commands: str; summary; fix

str() StructureShows structure of Datafile; “data.frame: 4 obs. of 4 variables”

summary() SummaryShows summary: Min; Max; Mean; Median

fix() FixShows data structure in matrix form to change (fix) entries

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.15

Untitled—R Editor

str(Datafile)summary(Datafile)fix(Datafile)

R Console

> (shows structure of datafile)> (shows summary of datafile)> (allows fixing of datafile)

Page 16: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

R: Getting Started

Topic Description

Help Get help with “read.csv” command?(read.csv)help(read.csv)

Help Results help(read.csv); shows defaults:read.csv(file, header=TRUE, sep=,”, quote=“\”,dec=“.”, fill=TRUE, comment.char=“”, …)Followed by explanations of commands and parameters

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.16

Untitled—R Editor

help(read.csv)

R Console

<Opens new window with help>

Page 17: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

R: Getting Started

Topic Description

Packages Load packages when functions are missing

Load Select “Packages” Load Package from RGui top menuSelect CRAN mirror: USA (CA 1), UK (London), Vietnam, etc.

Install Select “Packages” Install PackageSelect Package from scrolling list: lm() [regression analysis], ..

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.17

Untitled—R Editor R Console

> chooseCRANmirror()> utils::menuInstallPkgs()

File Edit Packages Windows Help

Page 18: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

R: Regression

Topic Description

Data Created dataset “RealData” of real estate valuesData captures Price, House Size, and Lot Size for 20 housesConvert data to CSV format; Excel: “Save As” csv

Load Drag and drop data into R ConsoleR Console: Copy filepath nameR Editor: Paste filepath name; add read.csv commandR Editor: Run Line

Structure Check structure of datasetstr(RealData)‘data.frame’: 20 observations of 3 variables:Price: num 6 5.8 5.6 …; House: num 6.9 8 …; Lot: num 42.7…

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.18

Untitled—R Editor

RealData<-read.csv(“C:\\My ..”,header=T)

R Console

> load(“C:\\My Documents\\...”

Page 19: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

R: Regression

Topic Description

Dependent Set Price equal to Dependent variable

Explanatory Price is a function of Explanatory variables House and Lot

Equation Price = c1 + c2*(House Size) + c3*(Lot Size)

lm Regression analysis in R; stands for Linear Model

Syntax lm(Dependent~Independent+Independent, Dataset)

Equation lm(Price~House+Lot,RealData)Type into R Editor; Run LineSee results in R Console

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.19

Untitled—R Editor

lm(Price~House+Lot,RealData)

R Console

> lm(Price+House+Lot,RealData)(Intercept) House Lot-0.55415 0.64680 0.02763

Page 20: Marketing Analytics with R Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are

R: Regression

Topic Description

Results Compare results from R with those from ExcelMethod Coefficient House Size Lot SizeExcel -0.554 +0.646 +0.027R -0.55415 +0.64680 +0.02763

Interpretations R results same as those from ExcelHouse size important factor when assessing priceLot size not as important

Statistics Option of calculating regression statisticsRealReg<-lm(Price~House+Lot,RealData)summary(RealReg)Gives significance codes, R-squared, F-statistics

© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.20

Untitled—R Editor

lm(Price~House+Lot,RealData)

R Console

> lm(Price+House+Lot,RealData)(Intercept) House Lot-0.55415 0.64680 0.02763