Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Introduction to RA free Statistical/Mathematical Software
Wanhua Su
Department of Mathematics and Statistics
MacEwan University
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 1 / 29
Outline
1 Package R Commander in RDownload R and R CommanderAbout the R Commander PackageExamples
2 Command-Driven RUsing R or R Studio as a CalculatorFunction and AssignmentVector and MatrixProbability functionsPlot in RWrite Your Own R FunctionExamples
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 2 / 29
An Introduction to R
This workshop consists of two parts:
R and R commander (a point-and-click interface similar to Minitaband SPSS) and examples.
R is a command-driven software. More details on coding.
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 3 / 29
An Introduction to R
This workshop consists of two parts:
R and R commander (a point-and-click interface similar to Minitaband SPSS) and examples.
R is a command-driven software. More details on coding.
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 3 / 29
About R and R Commander
R is a free software for statistical computing and graphics. Created byRoss Ihaka and Robert Gentleman.
R is a modern implementation of S language developed by JohnChambers.
R is a command-driven software including packages contributed byscientists (statisticians) all over the world.
R commander is an R package providing a point-and-click interfacesimilar to Minitab or SPSS.
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 4 / 29
About R and R Commander
R is a free software for statistical computing and graphics. Created byRoss Ihaka and Robert Gentleman.
R is a modern implementation of S language developed by JohnChambers.
R is a command-driven software including packages contributed byscientists (statisticians) all over the world.
R commander is an R package providing a point-and-click interfacesimilar to Minitab or SPSS.
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 4 / 29
Obtain the Software
Download R or R Studio: “http://www.r-project.org/ ”.
Install R. Windows, Unix and Macs.
Start R: In Windows and Macs, click “R”; under Unix, type “R”.
For help: use ? or help(). For example, check how to use lm, the Rfunction for linear model. Type ?lm or help(lm).
To quit an R session: In Windows and Macs, close the window; underUnix, type “q()”
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 5 / 29
Obtain the Software
Download R or R Studio: “http://www.r-project.org/ ”.
Install R. Windows, Unix and Macs.
Start R: In Windows and Macs, click “R”; under Unix, type “R”.
For help: use ? or help(). For example, check how to use lm, the Rfunction for linear model. Type ?lm or help(lm).
To quit an R session: In Windows and Macs, close the window; underUnix, type “q()”
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 5 / 29
The R Commander Package
The R package Rcmdr provides a point-and-click interface to R whichis a command-driven software.
Install the package: Packages→Install package(s)→choose amirror→double click the package Rcmdr.
Load the package: Type library(Rcmdr) in the console window.
Read data into R: Data from the toolbar, very similar to Minitab.
The R commands are shown in the R Script window.
The toolbar includes Graphs for plots such as histogram, scatterplot,and etc; Statistics for numerical summaries and inferences on means,proportions, and variances; Models for model selections and building;Distributions for probability distributions.
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 6 / 29
The R Commander Package
The R package Rcmdr provides a point-and-click interface to R whichis a command-driven software.
Install the package: Packages→Install package(s)→choose amirror→double click the package Rcmdr.
Load the package: Type library(Rcmdr) in the console window.
Read data into R: Data from the toolbar, very similar to Minitab.
The R commands are shown in the R Script window.
The toolbar includes Graphs for plots such as histogram, scatterplot,and etc; Statistics for numerical summaries and inferences on means,proportions, and variances; Models for model selections and building;Distributions for probability distributions.
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 6 / 29
Examples
Example 1: prices of 15 used cars. Two variables: age (year) andprice ($). Import the data set car.txt. Can we use a simple linearregression to model their relationship?
Example 2: An experiment was conducted to measure and comparethe effectiveness of various feed supplements on the growth rate ofchickens. Date set chickwts in R package datasets. Two variables:weight and feed type on 71 chickens.
◮ Side-by-side box plots of weights by feed type.◮ What method would you like to use? Is there any difference between
feed types?◮ Any other analysis to follow up?
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 7 / 29
Examples
Example 1: prices of 15 used cars. Two variables: age (year) andprice ($). Import the data set car.txt. Can we use a simple linearregression to model their relationship?
Example 2: An experiment was conducted to measure and comparethe effectiveness of various feed supplements on the growth rate ofchickens. Date set chickwts in R package datasets. Two variables:weight and feed type on 71 chickens.
◮ Side-by-side box plots of weights by feed type.◮ What method would you like to use? Is there any difference between
feed types?◮ Any other analysis to follow up?
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 7 / 29
Using R or R Studio as a Calculator
Arithmetic
> 2+1
[1] 3
> 2^2
[1] 4
> (3-1)*3
[1] 6
> 1-2*3
[1] -5
> 2/3
[1] 0.6666667
Trick 1: Try to save your R code in a .R file. You can copy the linesyou would like to run and paste in the R Console window.
◮ In Windows, go to Menu “File” and choose “New Script ” to create anew .R file or choose “Open Script...” to open the existing one.
◮ In Mac, go to Menu “File” and choose ”New Document” to create anew .R file or choose “Open Document...” to open the existing one.
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 8 / 29
Function and Assignment
R functions> sqrt(2) #sqrt root of 2
[1] 1.414214
> exp(1) #exponential function
[1] 2.718282
> sin(2) #sin function
[1] 0.9092974
> log(10) #natural logarithms of 10
[1] 2.302585
Assignment: use = or <-> x=2
> x+3
[1] 5
> x<-exp(1)
> x
[1] 2.718282
Trick 2: use # to add some explanation to the R code. Everything after #will be ignored in R.
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 9 / 29
Vector Assignment
Use c() to create a vector.
> Age=c(1,1,3,4,4)
> Price=c(13990,13495,12999,9500,10495)
Use “,” to separate data.
Combine vectors:◮ to firm a new vector
> c(Age,Price)
[1] 1 1 3 4 4 13990 13495 12999 9500 10495
◮ to firm a matrix
> cbind(Age,Price) > rbind(Age,Price)
Age Price [,1] [,2] [,3] [,4] [,5]
[1,] 1 13990 Age 1 1 3 4 4
[2,] 1 13495 Price 13990 13495 12999 9500 10495
[3,] 3 12999
[4,] 4 9500
[5,] 4 10495
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 10 / 29
Creating Structured Data
Use :
> 1:10 #1 to 10, increase by 1
[1] 1 2 3 4 5 6 7 8 9 10
> 10:1 #10 to 1, decrease by 1
[1] 10 9 8 7 6 5 4 3 2 1
Use seq()
> seq(1,9,by=2) #sequence from 1 to 9
[1] 1 3 5 7 9
> seq(1,10,by=2) #commom difference=2
[1] 1 3 5 7 9
> seq(1,9,length=5) #of length 5
[1] 1 3 5 7 9
> seq(1,10,length=5)
[1] 1.00 3.25 5.50 7.75 10.00
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 11 / 29
Vector Operation in R is Element-wise
> x=c(1,2,3)
> x
[1] 1 2 3
> y=c(5,2,3)
> y
[1] 5 2 3
> x+y # we get 1+5, 2+2, 3+3
[1] 6 4 6
> x/y # we get 1/5, 2/2, 3/3
[1] 0.2 1.0 1.0
> x*y # we get 1*5, 2*2, 3*3
[1] 5 4 9
> x^2 # we get 1^2, 2^2, 3^2
[1] 1 4 9
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 12 / 29
Functions on Vectors
> vec=c(2,1,3)
> length(vec) #length of a vector
[1] 3
> sum(vec) #sum of the observations in a vector
[1] 6
> mean(vec) #mean of the observations in a vector
[1] 2
> var(vec) #sample variance of 3 observations
[1] 1
> sort(vec) #sort 3 obervations increasingly
[1] 1 2 3
> min(vec) #minimum value of 3 observations
[1] 1
> max(vec) #maximum value of 3 observations
[1] 3
> summary(vec) #summary of the observations
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.0 1.5 2.0 2.0 2.5 3.0
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 13 / 29
Accessing Data Using Indices
> vec=1:5
> vec[1] #1st element of vec
[1] 1
> vec[1:4] #the first four elements
[1] 1 2 3 4
> vec[c(1,3,5)] #the 1st, 3rd and 5th elements
[1] 1 3 5
> vec[-(1:2)] #all except the first 2 elements
[1] 3 4 5
> vec[vec>3] #all elements of vec greater than 3
[1] 4 5
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 14 / 29
Create Matrix Using matrix()
Use matrix(a, m, n) to generate a m × n with elements aUse diag(n) to generate a n× n identity matrix.> matrix(1,2,3) #a 2 by 3 matrix with all elements 1
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 1
> vec=c(1:6)
> matrix(vec,2,3) #create a 2 by 3 matrix by columns
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> matrix(vec,2,3,byrow=T) #by rows
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
> diag(3) #3 by 3 identity matrix
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 15 / 29
Take Elements From a Matrix
> out=cbind(c(1,2,3),c(4,5,6),c(7,8,9))
> out
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> out[1:2,] #Take the 1st row and 2nd row
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
> out[,c(1,3)] #Take the 1st column and 3rd column
[,1] [,2]
[1,] 1 7
[2,] 2 8
[3,] 3 9
>out[1,3] #Take (1,3) element of the matrix out
[1] 7Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 16 / 29
Matrix Operation (1)
> x =cbind(c(1,2),c(3,4)) #combine by column
> x
[,1] [,2]
[1,] 1 3
[2,] 2 4
> y =rbind(c(6,8),c(7,9)) #combine by row
> y
[,1] [,2]
[1,] 6 8
[2,] 7 9
> x+y # return the sum of x and y
[,1] [,2]
[1,] 7 11
[2,] 9 13
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 17 / 29
Matrix Operation (2)
> x%*%y # return the matrix product of x and y
[,1] [,2]
[1,] 27 35
[2,] 40 52
>
> x*y # not matrix product of x and y
[,1] [,2]
[1,] 6 24
[2,] 14 36
>
> t(x) # transpose of x
[,1] [,2]
[1,] 1 2
[2,] 3 4
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 18 / 29
Matrix Operation (3)
> det(x) #determinant of x
[1] -2
> solve(x) #inverse of x
[,1] [,2]
[1,] -2 1.5
[2,] 1 -0.5
> x%*%solve(x) #check
[,1] [,2]
[1,] 1 0
[2,] 0 1
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 19 / 29
Obtain Data From a File: Use read.table()Suppose the data have been saved in the file car.txt.
> data=read.table("/Users/wanhuasu/documents/MacEwan/talk/R/car.txt",
header=T)
> data
Age Price
1 1 13990
2 1 13495
3 3 12999
4 4 9500
5 4 10495
6 5 8995
7 5 9495
8 6 6999
9 7 6950
10 7 7850
11 8 6999
12 8 5995
13 10 4950
14 10 4495
15 13 2850
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 20 / 29
Probability Functions
R has four types of functions for getting information about a family ofdistributions.
◮ “d” function: return the pdf of the distribution;◮ “p” function: return the cdf of the distribution;◮ “q” function: return the quantiles;◮ “r” function: return random observations.
The R names◮ binomial distribution: binom◮ poission distribution: pois◮ normal distribution: norm◮ t-distribution: t◮ F-distribution: F◮ χ2-distribution: chisq
Combine the four functions to each name, then get the four functionsfor each distribution.
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 21 / 29
Plot a Graph in R
x=seq(-2*pi,2*pi,length=100)
y=sin(x)
z=cos(x)
plot(x,y,type="p") #scatter plot, type=points
plot(x,y,type="l") #scatter plot, type=line
plot(x,y,type="b") #scatter plot, type=line and points
plot(x,y,pch=19,xlab="x",ylab="f(x)")
#plot the points using solid circles
lines(x,y,lty=1,col="red") #add a red solid line
points(x,z,pch=2) #add points using triangles
lines(x,z,lty=2,col="blue") #add a blue dashed line
title("Sin & Cos") #add a title
par(mfrow=c(2,3)) #plot 6 graphs, 2 rows and 3 columns
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 22 / 29
Two Ways to Save a Graph
Click on the graph, go to the menu ”File”, choose ”Save as”, thenchoose the file type and the place you would like to save.
Use postscript() or pdf() to start the graphics device, plot the graph,and then close the device by dev.off().
> postscript(file="/Users/wanhuasu/documents/MacEwan
/talk/R/plot.ps",height=8,width=8,horizontal=F)
#start the graphic device
> par(mfrow=c(2,1)) #two figures in one plot
> plot(x,y,pch=1,xlab="x",ylab="sin(x)")
> lines(x,y,lty=1,col="red")
> title("Sin(x)") #title of the first figure
> plot(x,z,pch=1,xlab="z",ylab="cos(z)")
> lines(x,z,lty=1,col="blue")
> title("Cos(z)") #title of the second figure
> dev.off() #close the graphic device
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 23 / 29
Write Your Own R Functions
Write a function to standardize a vector to have mean 0 and variance 1.
> std=function(x){
+ m=mean(x)
+ s=sqrt(var(x))
+ result=(x-m)/s
+ return(result)
+ }
> x=1:5
> y=std(x)
> y
[1] -1.2649111 -0.6324555 0.0000000 0.6324555 1.2649111
> mean(y)
[1] 0
> var(y)
[1] 1
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 24 / 29
Summarize the Data (Stat 151)
Annual income from 40 U.S. households. Two factors: race (whiteand black) and region (Northeast, Northwest, South, West)
Use hist(), stem(), qqnorm(), boxplot() to plot the data
Use table(), apply(), tapply() to summarize the data
data=read.table("/Users/wanhuasu/documents/MacEwan/stat252/data/income.txt",
sep="",header=T)
attach(data)
hist(income) #histogram of the variable income
stem(income) #stem-and-leaf diagram
boxplot(income~race,data=data) #side-by-side boxplot, by race
> tapply(income,race,mean) #sample means for levels of race
Black White
27.820 41.195
> tapply(income,region,mean) #sample means for region
MWest NEast South West
32.95 37.06 31.38 36.64
> table(race,region) #contingence table by race and region
region
race MWest NEast South West
Black 5 5 5 5
White 5 5 5 5
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 25 / 29
Linear Regression: Price of Used Cars> data=read.table("/Users/wanhuasu/documents/MacEwan/talk/R
/car.txt",header=T)
> age=data[,1]
> price=data[,2]
> model=lm(price~age,data=data) #fit a linear model
> names(model) #model is an object
[1] "coefficients" "residuals" "effects"
[4] "rank" "fitted.values" "assign"
[7] "qr" "df.residual" "xlevels"
[10] "call" "terms" "model"
> summary(model) #summary of the model
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14285.95 448.67 31.84 1.01e-13 ***
age -959.05 64.58 -14.85 1.56e-09 ***
The fitted regression is P̂rice = 14285.95 − 959.05 × AgeWanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 26 / 29
Least Squares Line: Price of Used Cars
plot(age,price,xlab="Age (year)",ylab="Price ($)",cex.lab=1.3)
abline(coef=model$coef,col="red",lwd=2)
title("Scatterplot of Used Car",cex.main=1.5)
2 4 6 8 10 12
4000
6000
8000
1000
012
000
1400
0
Age (year)
Pric
e ($
)
Scatterplot of Used Car
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 27 / 29
Example: Two-Way ANOVA (Stat 252)Annual income from 40 U.S. households. Two factors: race (white andblack) and region (Northeast, Northwest, South, West)
> data=read.table("/Users/wanhuasu/documents/MacEwan/stat252
/data/income.txt",sep="",header=T)
> model=lm(income~race+region+race*region,data=data)
> anova(model)
Analysis of Variance Table
Response: income
Df Sum Sq Mean Sq F value Pr(>F)
race 1 1788.9 1788.9 10.8733 0.002395 **
region 3 232.7 77.6 0.4715 0.704296
race:region 3 38.8 12.9 0.0787 0.971107
Residuals 32 5264.7 164.5
Conclusions: no interaction effect, no main effect due to region, maineffect due to race is significant.
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 28 / 29
Acknowledgments
Thanks for your time and attention.
The slides of this talk is available on my webpage:http://academic.macewan.ca/suw3/
More detailed tutorial on R can be found on R webpage:http://www.r-project.org/
Comparison of R and Matlab:http://mathesaurus.sourceforge.net/octave-r.html
Wanhua Su (MacEwan) Introduction to R FD Day, August 26, 2015 29 / 29