14
Bob Muenchen, Author R for SAS and SPSS Users , Co-Author R for Stata Users [email protected], http://r4stats.com Copyright © 2010, 2011, Robert A Muenchen. All rights reserved. 2 What is R? R’s Advantages R’s Disadvantages Installing and Maintaining R Ways of Running R An Example Program Where to Learn More 3 The most powerful statistical computing language on the planet.” -Norman Nie, Developer of SPSS Language + package + environment for graphics and data analysis Free and open source Created by Ross Ihaka & Robert Gentleman 1996 & extended by many more An implementation of the S language by John Chambers and others R has 4,950 add-ons, or nearly 100,000 procs 4

What is R? R’s Advantages R’s Disadvantages Installing and ... · Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel…)

  • Upload
    others

  • View
    18

  • Download
    0

Embed Size (px)

Citation preview

Page 1: What is R? R’s Advantages R’s Disadvantages Installing and ... · Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel…)

Bob Muenchen, Author R for SAS and SPSS Users,Co-Author R for Stata Users

[email protected], http://r4stats.com

Copyright © 2010, 2011, Robert A Muenchen. All rights reserved. 2

What is R? R’s Advantages R’s Disadvantages Installing and Maintaining R Ways of Running R An Example Program Where to Learn More

3

“The most powerful statistical computing language on the planet.” -Norman Nie, Developer of SPSS

Language + package + environment for graphics and data analysis

Free and open source Created by Ross Ihaka & Robert Gentleman 1996

& extended by many more An implementation of the S language by

John Chambers and others R has 4,950 add-ons, or nearly 100,000 procs

4

Page 2: What is R? R’s Advantages R’s Disadvantages Installing and ... · Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel…)

http://r4stats.com/popularity5 6Source: r4stats.com/popularity

1. Data input & management (data step)2. Analytics & graphics procedures (proc step)3. Macro language4. Matrix language5. Output management systems (ODS/OMS)

R integrates these all seamlessly.

7

* SAS Approach;

DATA A; SET A;

logX = log(X);

PROC REG;

MODEL Y = logX;

# R Approach

lm( Y ~ log(X) )

8

Page 3: What is R? R’s Advantages R’s Disadvantages Installing and ... · Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel…)

Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel…) Its object orientation “does the right thing” Its language is powerful & fully integrated Procedures you write are on an equal footing It is the universal language of data analysis It runs on any computer Being open source, you can study and modify it It is free

9 10

* Using SAS;PROC TTEST DATA=classroom; CLASS gender; VAR score;

# In Rt.test(score ~ gender, data=classroom)

t.test(posttest, pretest , paired=TRUE, data=classroom)

11

Language is somewhat harder to learn Help files are sparse & complex Must find R and its add-ons yourself Graphical user interfaces not as polished Most R functions hold data in main memory Rule-of-thumb: 10 million values per gigabyte SAS/SPSS: billions of records Several efforts underway to break R’s memory limit

including Revolution Analytics’ distribution

12

Page 4: What is R? R’s Advantages R’s Disadvantages Installing and ... · Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel…)

Base R plus Recommended Packages like: Base SAS, SAS/STAT, SAS/GRAPH, SAS/IML Studio SPSS Stat. Base, SPSS Stat. Advanced, Regression

Tested via extensive validation programs But add-on packages written by… Professor who invented the method? A student interpreting the method?

13

Email support is free, quick, 24-hours: www.r-project.org/mail.html Stackoverflow.com Quora.com Crossvalidated stats.stackexchange.com

/questions/tagged/r Phone support available commercially

14

1. Go to cran.r-project.org, the Comprehensive R Archive Network

2. Download binaries for Base & run3. Add-ons:

install.packages(“myPackage”)

4. To update: update.packages()

15

Comprehensive R Archive Network Crantastic.com Inside-R.org R4Stats.com

16

Page 5: What is R? R’s Advantages R’s Disadvantages Installing and ... · Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel…)

17 18

19 20

Page 6: What is R? R’s Advantages R’s Disadvantages Installing and ... · Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel…)

21

Run code interactively Submit code from Excel, SAS, SPSS,… Point-n-click using

Graphical User Interfaces (GUIs) Batch mode

22

23 24

Page 7: What is R? R’s Advantages R’s Disadvantages Installing and ... · Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel…)

25 26Copyright © 2010, 2011, Robert A Muenchen. All rights reserved.

run ExportDataSetToR("mydata");

submit/r;mydata$workshop <-factor(mydata$workshop)

summary(mydata)

endsubmit;

27

GET FILE=‘mydata.sav’.BEGIN PROGRAM R.mydata <- spssdata.GetDataFromSPSS(

variables = c("workshop gender q1 to q4"),missingValueToNA = TRUE,row.label = "id" )

summary(mydata)END PROGRAM.

28

Page 8: What is R? R’s Advantages R’s Disadvantages Installing and ... · Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel…)

29 30

31 32

Page 9: What is R? R’s Advantages R’s Disadvantages Installing and ... · Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel…)

33 34

A company focused on R development & support Run by SPSS founder Norman Nie Their enhanced distribution of R:

Revolution R Enterprise Free for colleges and universities, including for

outside consulting

35

Page 10: What is R? R’s Advantages R’s Disadvantages Installing and ... · Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel…)
Page 11: What is R? R’s Advantages R’s Disadvantages Installing and ... · Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel…)

43 44

Page 12: What is R? R’s Advantages R’s Disadvantages Installing and ... · Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel…)

mydata <- read.csv("mydata.csv")

print(mydata)

mydata$workshop <- factor(mydata$workshop)

summary(mydata)

plot( mydata$q1, mydata$q4 )

myModel <- lm( q4~q1+q2+q3, data=mydata )

summary( myModel )

anova( myModel )

plot( myModel )

45

> mydata <- read.csv("mydata.csv")

> print(mydata)

workshop gender q1 q2 q3 q4

1 1 f 1 1 5 1

2 2 f 2 1 4 1

3 1 f 2 2 4 3

4 2 <NA> 3 1 NA 3

5 1 m 4 5 2 4

6 2 m 5 4 5 5

7 1 m 5 3 4 4

8 2 m 4 5 5 546

> mydata$workshop <-factor(mydata$workshop)

> summary(mydata)workshop gender

1:4 f :3

2:4 m :4

NA's:1

q1 q2 q3 q4

Min. :1.00 Min. :1.00 Min. :2.000 Min. :1.00

1st Qu.:2.00 1st Qu.:1.00 1st Qu.:4.000 1st Qu.:2.50

Median :3.50 Median :2.50 Median :4.000 Median :3.50

Mean :3.25 Mean :2.75 Mean :4.143 Mean :3.25

3rd Qu.:4.25 3rd Qu.:4.25 3rd Qu.:5.000 3rd Qu.:4.25

Max. :5.00 Max. :5.00 Max. :5.000 Max. :5.00

NA's :1.000

47 48

Page 13: What is R? R’s Advantages R’s Disadvantages Installing and ... · Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel…)

> myModel <- lm(q4 ~ q1+q2+q3, data=mydata)

> summary(myModel)

Call:

lm(formula = q4 ~ q1 + q2 + q3, data = mydata)

Residuals:

1 2 3 5 6 7 8

-0.3113 -0.4261 0.9428 -0.1797 0.0765 0.0225 -0.1246

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -1.3243 1.2877 -1.028 0.379

q1 0.4297 0.2623 1.638 0.200

q2 0.6310 0.2503 2.521 0.086

q3 0.3150 0.2557 1.232 0.306

Multiple R-squared: 0.9299, Adjusted R-squared: 0.8598

F-statistic: 13.27 on 3 and 3 DF, p-value: 0.03084

49 50Copyright © 2010, 2011, Robert A Muenchen. All rights reserved.

51 52

Page 14: What is R? R’s Advantages R’s Disadvantages Installing and ... · Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel…)

53

R for SAS and SPSS Users, Muenchen R for Stata Users, Muenchen & Hilbe R Through Excel: A Spreadsheet Interface for Statistics,

Data Analysis, and Graphics, Heiberger & Neuwirth Data Mining with Rattle and R: The Art of Excavating

Data for Knowledge Discovery, Williams

54

R is powerful, extensible, free Download it from CRAN Academics download Revolution R Enterprise

for free at www.revolutionanalytics.com You run it many ways & from many packages Several graphical user interfaces are available R's programming language is the way

to access its full power

55

[email protected]

Slides: r4stats.com/misc/webinarPresentation: bit.ly/R-sas-spss