Upload
norman-foster
View
214
Download
0
Embed Size (px)
Citation preview
Marketing Analytics with R
Disclaimer:• All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are used here for educational purposes only
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.1
Statistical Analysis Software: Introduction
Topic Definition
Definition Software designed for in-depth analysisUnlike MS Excel (general purpose spreadsheet)
Origins SAS conceived in 1966 by Anthony J. BarrPlaced statistical procedures in formatted file framewk
Uses Advanced statistical techniquesNonlinear functions; Multiple regression; Conjoint
Advantages Powerful; Accurate; Specific tools
Disadvantages Command line interface; steep learning curveVery expensive
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.2
Statistical Analysis Software: Supplier Companies
Topic Definition
Statistical Software SAS: Market leader, especially in Fortune 500SPSS: Strong in education market (IBM)R: Open source Others: StatPac, StatSoft STATISTICA, etc.
Business Intelligence Overall Size: 2013: $13.8B; 2016: $17.1BIBM Cognos (2011: 12.1% of market)Microsoft BI (2011: 8.7% of market)Oracle Hyperion 2011 (2011: 15.6% of market)SAP Business Objects (2011: 23.6% of market)SAS Business Intelligence (2011: 12.6% of market)SPSS Modeler (2011: 0.4% of market)Others: GoodData, Panorama, Tableau, etc.
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.3
Gartner Press Release, “Gartner Says WorldWide Business Intelligence Software Revenue to Grow 7% in 2013.” February 19, 2013. http://www.gartner.com/newsroom/id/2340216SAS Press Release, “SAS in Leaders Quadrant for Business Intelligence Platforms.” February 3, 2010.http://www.sas.com/news/preleases/biplatformsgartnerleader.html
Statistical Analysis Software: Supplier Companies
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.4
Kalakota, Ravi. PracticalAnalytics.Wordpress.Com.” Gartner Says - BI and Analytics a $12.2B Market.” April 24, 2011.http://practicalanalytics.wordpress.com/2011/04/24/gartner-says-bi-and-analytics-a-10-5-bln-market/
Tableau MicrosoftOracle IBM
SAS
SAPActuate
Panorama
GoodData
Jaspersoft
Challengers Leaders
Niche Visionaries
Gartner Magic QuadrantBusiness IntelligenceApril 2011(Excerpts)
Ability toExecute
Completeness of Vision
Statistical Analysis Software: Supplier Companies
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.5
Kalakota, Ravi. PracticalAnalytics.Wordpress.Com.” Gartner Says - BI and Analytics a $12.2B Market.” April 24, 2011.http://practicalanalytics.wordpress.com/2011/04/24/gartner-says-bi-and-analytics-a-10-5-bln-market/
ReportingWhatHappened?
Business IntelligenceTechnologiesTDWI Model
Complexity
AnalysisWhy did it happen?
MonitoringWhat’s happening now?
PredictionWhat might happen?
Predictive analytics
Dashboards, Scorecards
OLAP, Visualization tools
Query, reporting, and search tools
Business ValueLow
High
High
Statistical Analysis Software: Major Suppliers
Criteria SAS SPSS R
Market Fortune 500 Universities UniversitiesFocus Power Ease of use PriceUser Power user Student Price-sensitiveOrigins Industry Education Open SourceLearning Difficult Moderate ModerateCost $86,600/yr+ $16,000/yr+ FreeUI Command Line Point & Click Command LineDatabase 32,768 var. 1 file at a timeGraphics SAS/Graph High quality Different packagesAnalogy Microsoft Apple Linux
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.6
UCLA, Statistical Software Packages Comparison, ats.ucla.edu:http://www.ats.ucla.edu/stat/mult_pkg/compare_packages.htm
MineQuest Business Analytics, “Cost of Licensing WPS 3.0 vs. SAS 9.3.” February 2013.http://www.minequest.com/downloads/Pricing_Comparisons_Between_WPS_and_SAS.pdfIBM SPSS Statistics website, “Buy IBM SPSS Statistics Now”http://www-01.ibm.com/software/analytics/spss/products/statistics/buy-now.html
R: Introduction
Topic Description
Description Free statistical computing and graphics software packageWidely used among statisticians and data minersIncreased popularity in 2010 - on
History Started in 1993Implementation of the S programming language (1976)S offered interactive alternative to Fortran programsS developed by John M. Chambers of Stanford UniversityR developed by Ross Ihaka and Robert Gentleman“R” from Ross & Robert, as well as play on “S”
Commercial Revolution Analytics offers enterprise version ($)
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.7
References:1. Venables, W.N., Smith, D.M., “An Introduction to R.” Version 3.0.1. May 16, 2013.http://www.cran.r-project.org/doc/manuals/R-intro.pdf
R: Introduction
Topic Description
Features Variety of statistical and graphical techniquesDistributed through GNU GPL (General Public License)GNU: Gnu’s Not Unix; Recursive acronym
Advantages FreePowerfulExtensible through functions and extensionsR community noted for its active contributionsDifferent graphical user interfaces (GUIs) available
Disadvantages Can be slow and memory-hungryUses command line interpreter; No native GUI
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.8
References:1. Venables, W.N., Smith, D.M., “An Introduction to R.” Version 3.0.1. May 16, 2013.http://www.cran.r-project.org/doc/manuals/R-intro.pdf
R: Basics
Topic Description
Commands Based on UNIX; case sensitiveCommands separated by “;” or by newlineCompound expression in braces: “{ and }”Comments designated by hashtag: #Comment
Data Structure Vector Assignment:> x <- c (1, 2, 3, 4, 5.8)> : Prompt at beginning of line<- : Assignment operatorc() : Function c
Class “Numeric”; “Logical”; “Character”; “List”
Reading Data “read.table()” function> HousePrice <- read.table(“houses.data”)
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.9
References:1. Venables, W.N., Smith, D.M., “An Introduction to R.” Version 3.0.1. May 16, 2013.http://www.cran.r-project.org/doc/manuals/R-intro.pdf
R: Basics
Topic Description
Class “Numeric”; “Logical”; “Character”; “List”
Reading Data “read.table()” function> HousePrice <- read.table(“houses.data”)
Function R features a rich set of functionsStatistics functions: mean(x); median(x); range(x); etc.Arithmetic functions: 4^2; log (10); sqrt (16)
Plots > hist(x)# generates a default histogram> plot(x,y)# generates a quick x-y plot> quartz(height=4, width=10)# make a wide window
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.10
References:1. Venables, W.N., Smith, D.M., “An Introduction to R.” Version 3.0.1. May 16, 2013.http://www.cran.r-project.org/doc/manuals/R-intro.pdf
R: Getting Started
Topic Description
Download R Windows:http://cran.r-project.org/bin/windows/base/Mac:http://cran.r-project.org/bin/macosx/
Launch R Double-click to launchWill see prompt in “R Console”>
New Script Select File > New ScriptEditor will openArrange Editor window on left; Console on right
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.11
Untitled—R Editor R Console
> |
R: Getting Started
Topic Description
Enter Vector <- = “Equal to”; [<- looks like arrow]Example: vector<-c(2, 4, 6, 8)
Run Line Execute (run) lineHighlight line on R editor; Click on “Run Line” icon;3rd from leftWill see “vector” entered in console
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.12
Open ScriptSave Script
Run LineReturn focus to Console
Untitled—R Editor
vector<-c(2,4,6,8)
R Console
> vector<-c(2,4,6,8)
RGui Icons
R: Getting Started
Topic Description
Statistics Find statisticsmean(vector) <RUN LINE> (mean)var(vector) <RUN LINE> (variance)sd(vector) <RUN LINE> (standard deviation)
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.13
Untitled—R Editor
vector<-c(2,4,6,8)mean(vector)var(vector)sd(vector)
R Console
> vector<-c(2,4,6,8)> mean(vector)[1] 5> var(vector)[1] 6.6667> sd(vector)[1] 2.5819
R: Getting Started
Topic Description
Directory Load data file to R; typically enter as CSV
CSV File Comma-Separated Values; “Save As” CSV in Excel
Example Datafile.csvA, B, C, D (identifiers)1, 2, 3, 4 (data for observation #1)2, 4, 6, 8 (data for observation #2)…
Load Data Drag csv file and drop into R ConsoleR will show filepath: “C:\\My Documents\\R Files\\...”Type filename and read command into R Editor Example<-read.csv(“C:\\My Documents…”, header=T); Run
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.14
Untitled—R Editor
Datafile<-read.csv(“C:\\My ..”,header=T)
R Console
> load(“C:\\My Documents\\...”
R: Getting Started
Topic Description
Directory Alternative approach: Set up working directory for datasetWorking directory allows shorter filepathsWindows: See “Windows Explorer help” for more infoMac: See “Finder help” for more info
Data Structured dataset commands: str; summary; fix
str() StructureShows structure of Datafile; “data.frame: 4 obs. of 4 variables”
summary() SummaryShows summary: Min; Max; Mean; Median
fix() FixShows data structure in matrix form to change (fix) entries
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.15
Untitled—R Editor
str(Datafile)summary(Datafile)fix(Datafile)
R Console
> (shows structure of datafile)> (shows summary of datafile)> (allows fixing of datafile)
R: Getting Started
Topic Description
Help Get help with “read.csv” command?(read.csv)help(read.csv)
Help Results help(read.csv); shows defaults:read.csv(file, header=TRUE, sep=,”, quote=“\”,dec=“.”, fill=TRUE, comment.char=“”, …)Followed by explanations of commands and parameters
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.16
Untitled—R Editor
help(read.csv)
R Console
<Opens new window with help>
R: Getting Started
Topic Description
Packages Load packages when functions are missing
Load Select “Packages” Load Package from RGui top menuSelect CRAN mirror: USA (CA 1), UK (London), Vietnam, etc.
Install Select “Packages” Install PackageSelect Package from scrolling list: lm() [regression analysis], ..
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.17
Untitled—R Editor R Console
> chooseCRANmirror()> utils::menuInstallPkgs()
File Edit Packages Windows Help
R: Regression
Topic Description
Data Created dataset “RealData” of real estate valuesData captures Price, House Size, and Lot Size for 20 housesConvert data to CSV format; Excel: “Save As” csv
Load Drag and drop data into R ConsoleR Console: Copy filepath nameR Editor: Paste filepath name; add read.csv commandR Editor: Run Line
Structure Check structure of datasetstr(RealData)‘data.frame’: 20 observations of 3 variables:Price: num 6 5.8 5.6 …; House: num 6.9 8 …; Lot: num 42.7…
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.18
Untitled—R Editor
RealData<-read.csv(“C:\\My ..”,header=T)
R Console
> load(“C:\\My Documents\\...”
R: Regression
Topic Description
Dependent Set Price equal to Dependent variable
Explanatory Price is a function of Explanatory variables House and Lot
Equation Price = c1 + c2*(House Size) + c3*(Lot Size)
lm Regression analysis in R; stands for Linear Model
Syntax lm(Dependent~Independent+Independent, Dataset)
Equation lm(Price~House+Lot,RealData)Type into R Editor; Run LineSee results in R Console
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.19
Untitled—R Editor
lm(Price~House+Lot,RealData)
R Console
> lm(Price+House+Lot,RealData)(Intercept) House Lot-0.55415 0.64680 0.02763
R: Regression
Topic Description
Results Compare results from R with those from ExcelMethod Coefficient House Size Lot SizeExcel -0.554 +0.646 +0.027R -0.55415 +0.64680 +0.02763
Interpretations R results same as those from ExcelHouse size important factor when assessing priceLot size not as important
Statistics Option of calculating regression statisticsRealReg<-lm(Price~House+Lot,RealData)summary(RealReg)Gives significance codes, R-squared, F-statistics
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.20
Untitled—R Editor
lm(Price~House+Lot,RealData)
R Console
> lm(Price+House+Lot,RealData)(Intercept) House Lot-0.55415 0.64680 0.02763