View
5
Download
0
Category
Preview:
Citation preview
A Crash Course in R (and othernotes)
Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jerome Bergeron (2)
(1)Department of Mathematics and Statistics, McGill University
(2) Department of Statistics and Actuarial Sciences, University of Waterloo
A Crash Course in R (and other notes) – p. 1/87
Intro to statistical computing ideas
Statistical computation is a tool, not a proof
Computational results can be used to developintuition, but not to confirmComputational results are also valuable as a form ofdata analysis
A Crash Course in R (and other notes) – p. 2/87
Think before you program
More time is wasted on debugging badly written programsthan on anything else
Will this program be used again on another dataset or aspart of another program?
Will other people be using this code?
Can I comparmentalize the programming so tha t it’seasier to improve speed and/or efficiency withoutcompletely re-writing code?
Has anyone else created the tools that I need toaccomplish my task?
A Crash Course in R (and other notes) – p. 3/87
Programming bit by bit
Always have a small test dataset for which you knowthe correct answer
Always test the program as you write it
Try to keep things simple and specific at first, thengeneralize
Document, document, document
A Crash Course in R (and other notes) – p. 4/87
Write usable re-usable code
Find the balance between hardwiring inputs and usingarguments
Make sure you’re using the right platform for yourproblem
Write code that is abstract but still not annoyingly vague
A Crash Course in R (and other notes) – p. 5/87
Open discussion points
Need to familiarize yourself with the computingenvironment
You can get R for free fromhttp://cran.r-project.org/ (Windows and Linuxversions), upgrading to the newest version is a goodidea
If you wish to incorporate C code, you may also needGNU compilers (easy on Mac and Linux, more in thecoming weeks for Windows).
A Crash Course in R (and other notes) – p. 6/87
Text editor and other applications
VI and Emacs are two common editors for Mac OS Xand Linux/UNIX systems, although any text editor willdo
XEmacs is also available for Windows
On Mac OS X, X Code allows for editing and compiling,similar to BloodShed Dev
Eclipse, KDevelop and Anjuta are available for Linux
A Crash Course in R (and other notes) – p. 7/87
First ‘mission’
If you haven’t done so, download the R statisticalpackage and install on your own machine
Check to see if you the have necessary developmentsoftware installed on your machine
A Crash Course in R (and other notes) – p. 8/87
Basic R stuff
Today: Quick review of basic R stuff (large bore potential)
Vectors
Matrices
Functions
Useful commands
Frames, arrays, lists
A Crash Course in R (and other notes) – p. 9/87
The basic R files
Anytime you start R, the application looks for two files,.Rhistory and .RData
Each .RData workspace is different
Advantage: You can have several, separateworkspaces for functions, datasets, etc. that can betransferred from machine to machineDisadvantage: You can have several, separate...
A Crash Course in R (and other notes) – p. 10/87
The basic R files
For Unix/Linux platforms, if you are starting R in adirectory where these files do not exist, it will createnew ones for you
For Windows users, the workspaces can be saved inweird places (cause Windows is weird), use File ->Load Workspace to find where it is stored on yourparticular machine
Save workspaces on Windows using File -> SaveWorkspace ; on Unix/Linux rename the .RData file aftersaving and exiting R in order to keep from opening itagain the next time.
.RData files can be used on different platformsinterchangeably
A Crash Course in R (and other notes) – p. 11/87
Other fundamental R stuff
.Rhistory file keeps all commands for a givenworkspace; the history() command lets you accessthem.
ls() will tell you the contents of the currently loadedworkspace
library( libraryname) will allow you access thefunctions and data in the libraryname library
http://cran.r-project.org/ has a list of librariesand packages that one can download that are not partof the R base package
A Crash Course in R (and other notes) – p. 12/87
Vectors
Building blocks of R
Create a vector with the function c( ... )
Example:
> firstvec<-c(89,10,390.38)
Can access parts of a vector using subscripts
A Crash Course in R (and other notes) – p. 13/87
Vectors (cont)
By using vector name [i] where i is the ith position in thevector, we can either print:
> firstvec[2][1] 10
or change the contents of the vector:
> firstvec[3]<-271.401> firstvec[1] 89.000 10.000 271.401
A Crash Course in R (and other notes) – p. 14/87
Variables
When I talk about variables,just think about variables asvectors with one element
Example:
> firstvar<-5> firstvar[1] 5
Useful for indices and constants that you’ll need later on
A Crash Course in R (and other notes) – p. 15/87
Vector Operations
Using operations such as ’+’, ’-’, ’*’, ’/’ can be trickydepending on the lengths of the vectors you’re trying touse:
If the vectors are the same length, the operation isperformed elementwise
If one vector contains one element, then the operationis performed for each element of the vector
A Crash Course in R (and other notes) – p. 16/87
Vector Operations
> 5*firstvec[1] 445.000 50.000 1357.005> c(5)*firstvec[1] 445.000 50.000 1357.005> c(5,2)*firstvec[1] 445.000 20.000 1357.005Warning message:longer object lengthis not a multiple of shorterobject length in: c(5, 2) * firstvec> c(5,2,3)*firstvec[1] 445.000 20.000 814.203
A Crash Course in R (and other notes) – p. 17/87
More vectors
c(..) can be used to concatenate vectors as well
> secondvec<-c(65, 109, 109.80)> combinedvec<-c(firstvec, secondvec)> combinedvec[1] 89.000 10.000 271.401 65.000[5] 109.000 109.800
A Crash Course in R (and other notes) – p. 18/87
Matrix Operations
A matrix can be thought of as a collection of vectors
Using the matrix function, you can create a matrixfrom a single vector, filling the matrix from top tobottom, left to right:
> firstmatrix<-matrix(combinedvec,nrow=3, ncol=2)
> firstmatrix[,1] [,2]
[1,] 89.000 65.0[2,] 10.000 109.0[3,] 271.401 109.8
A Crash Course in R (and other notes) – p. 19/87
More matrices
Accessing/changing elements of a matrix requiresspecifying a row index and a column index
> firstmatrix[3,2][1] 109.8> firstmatrix[1,1]<-79> firstmatrix
[,1] [,2][1,] 79.000 65.0[2,] 10.000 109.0[3,] 271.401 109.8
A Crash Course in R (and other notes) – p. 20/87
Matrices
Accessing whole columns/rows can be done by leavingthe cumulative index blank
> firstmatrix[1,][1] 79 65> firstmatrix[,2][1] 65.0 109.0 109.8
A Crash Course in R (and other notes) – p. 21/87
Matrix operations
R defaults to elementwise operation for multiplicationand division of matrices too
A Crash Course in R (and other notes) – p. 22/87
Matrix operations
> matrixa<-matrix(c(1,2,3,4), ncol=2)
> matrixa
[,1] [,2]
[1,] 1 3
[2,] 2 4
> matrixb<-matrix(c(5,6,7,8), ncol=2)
> matrixb
[,1] [,2]
[1,] 5 7
[2,] 6 8
> matrixa*matrixb
[,1] [,2]
[1,] 5 21
[2,] 12 32
A Crash Course in R (and other notes) – p. 23/87
Matrix operations
You can use special operators (such as %*% ) to do thenormal matrix multiplication:
> matrixa%*%matrixb[,1] [,2]
[1,] 23 31[2,] 34 46
A Crash Course in R (and other notes) – p. 24/87
Matrix operations
To find the inverse of a square matrix, use the functionsolve .
> ainv<-solve(matrixa)> ainv
[,1] [,2][1,] -2 1.5[2,] 1 -0.5> ainv%*%matrixa
[,1] [,2][1,] 1 0[2,] 0 1
A Crash Course in R (and other notes) – p. 25/87
Functions
Functions can be thought of as shortcuts to performsequences of calculations
All functions take arguments and most useful functionsreturn objects
Functions can call other functions
A Crash Course in R (and other notes) – p. 26/87
Calling functions
We’ve already discussed a couple of functions c(...)and matrix(...)
c(...) takes any number of arguments and returns avector containing those values
matrix(...) takes a couple of different kinds ofarguments: the values for the matrix, the number ofrows, and the number of columns
Other examples of built-in R functions aremean(..) ,sum(...) ,var(...) ,lm(...)
A Crash Course in R (and other notes) – p. 27/87
Useful functions for, um, functions
args( function ) will return the possible argumentsfor a function
> args(matrix)function(data = NA, nrow = 1, ncol = 1,
byrow = FALSE, dimnames = NULL)
A Crash Course in R (and other notes) – p. 28/87
Useful functions (cont.)
help( function ) will return a description of whatthe function does, the arguments it takes, and thevalues it returns
matrix package:base
Matrices
Description:
‘matrix’ creates a matrix from
the given set of values. ‘as.matrix’
attempts to turn its argument into a
matrix. ‘is.matrix’ tests if its
argument is a (strict) matrix.
Usage:
.....
A Crash Course in R (and other notes) – p. 29/87
Writing functions
The easiest way to write functions in R is to create thefunction in an outside text editor such as emacs or vion Linux/Unix machines and Notepad under WindowsOR choose to create a script using the menu in R
A function declaration has a specific format
One must specify the name of the function, thearguments, and the body of the function
In R, the last calculation performed in the functiondetermines what the returned value is
A Crash Course in R (and other notes) – p. 30/87
Declaring functions
Either at the command line or in a separate file, write:functionname <- function( argument1,
argument2, ...){
function calculation 1function calculation 2....returned function calculation
}
A Crash Course in R (and other notes) – p. 31/87
First function
The first function we can write is a standard deviationfunction, stdev()
stdev() should take a vector as an argument andshould return the square root of the variance of thevector of observations
Here’s what we would write in our text file stdev.R :
A Crash Course in R (and other notes) – p. 32/87
stdev(...) definition
stdev<-function( datavector ){
sqrt( var( datavector ) )
}
A Crash Course in R (and other notes) – p. 33/87
Usingstdev() in R
Can use the source function to load your function intoR if not made in R
> source("[Source directory]/stdev.R")> stdevfunction( datavector ){
sqrt( var( datavector ) )
}
A Crash Course in R (and other notes) – p. 34/87
Usingstdev() in R
Then test out the function using a small dataset forwhich you know the answer
> testvector<-c(7,3,5)> var(testvector)[1] 4> sd(testvector)[1] 2> stdev(testvector)[1] 2
A Crash Course in R (and other notes) – p. 35/87
Useful function tips
Typing the function name with no ()’s (like with stdev )will print the function declaration to the screen (workseven with built-in R functions)
Remember to use source each time you makechanges to the text file that contains your function(s)
You can have multiple arguments for any given function
You can also set default values for arguments that takecertain values a majority of the time
A Crash Course in R (and other notes) – p. 36/87
Another example function
Let’s say that we want a function that calculates atrimmed mean
Remember that a trimmed mean takes an orderedvector of data and eliminates the first and last X% ofpoints
What arguments should the function take?
What built-in functions will we want to use?
A Crash Course in R (and other notes) – p. 37/87
Trimmed mean function
Arguments: Data vector (maybe take a matrix?),percentage (fixed or an argument)?
Built-in functions we’ll need: mean, sum, sort , length?
A Crash Course in R (and other notes) – p. 38/87
Step-by-step
Initial function declaration in trimmed.mean.R , just tomake sure the syntax is correct:
trimmed.mean<-function( datavector,trim.percent=5 ){
mean(datavector)
}
A Crash Course in R (and other notes) – p. 39/87
Step-by-step
## Generate 100 Exponential random variables
> trim.test.vector<-rexp(100,1)
# Look at the values: yours will be different
> trim.test.vector
[1] 1.68888770 0.16894626 0.37284041 0.27736393
....
> mean(trim.test.vector)
[1] 0.9427151
> source("trimmed.mean.R")
> trimmed.mean(trim.test.vector)
[1] 0.9427151
A Crash Course in R (and other notes) – p. 40/87
Step-by-step
Now we need to set up the trimming...
We want to eliminate the first X% and last X% of thesorted data before taking the mean
We’ll need a couple of R tricks to do this
length(...) will give us the length of a vector
Can we assume that the data vector is in the correctorder?
What are the indices of the data points that we want toinclude in the mean calculation?
A Crash Course in R (and other notes) – p. 41/87
Step-by-step
First, we can use the sort(...) function to sort thedata
Then, we want to drop the same number of points fromeach end of the data set, percent/100 *length(datavector)
So calculate dropnumber and then find the mean ofthe data vector without those 2*dropnumberobservations
A Crash Course in R (and other notes) – p. 42/87
Step-by-step
trimmed.mean<-function( datavector, trim.percent=5 ){
dropnumber<-round(trim.percent/100*length(datavecto r))
mean(sort(datavector)[(dropnumber+1):(length(datave ctor)-dropnumber)])
}
A Crash Course in R (and other notes) – p. 43/87
Taking it apart
Round to make sure it’s an integer
dropnumber<-round(trim.percent/100*length(datavecto r))}
sort sorts the incoming datavector and returns thesorted vector’a:b’ is a shortcut used to get a sequence of integersbetween a and bmean(datavector[(dropnumber+1):(length(datavector)- dropnumber)])
Think about what the indices are of the numbers thatwe want to get
A Crash Course in R (and other notes) – p. 44/87
Checking the results
> source("trimmed.mean.R")
> trimmed.mean(trim.test.vector)
[1] 0.813712
> trimmed.mean(trim.test.vector, trim.percent=10)
[1] 0.7287425
A Crash Course in R (and other notes) – p. 45/87
Why does it work?
sort(....) returns a vector containing the argumentvector’s sorted values
Because sort(...) returns a vector, we can use thebrackets [ ]’s to access part of the vector, i.e.sort(...)[i:j]
A Crash Course in R (and other notes) – p. 46/87
while loops
while loops are used to indefinitely repeat sets ofcalculations until a specified goal has been reached
The function first checks the condition; if true, itexecutes the statements in order; if false, it skipseverything and goes to the next statement outside theloop
The syntax for writing a while loop in R is:while( condition ) {
looped line 1
looped line 2
...
}
A Crash Course in R (and other notes) – p. 47/87
Example: Newton’s method for root-finding
The standard example for stopping conditions isNewton’s method for finding roots of equations in onevariable
Newton’s method uses the derivative of the function todetermine a linear direction towards a root, i.e. it cannotfind all roots of an equation (x s.t. f(x) = 0)
A Crash Course in R (and other notes) – p. 48/87
Example: Newton’s method
Arguments: function to be solved, derivative function,tolerance, initial value, maximum iterations
Return: root
What should the condition be?
Let xi be the “guessed” root at the ith iterationIf |f(xi)| > tolerance and the number of iterations isless than the maximum number of iterations, then itshould continue.Otherwise it should stop
A Crash Course in R (and other notes) – p. 49/87
Example: Newton’s method
newtons.method <- function( f, derivfunc,
tol=1e-3, init=0, maxiter=100){
x <- init
iter <- 1
while( (abs(f(x)) > tol) & (iter<=maxiter)){
print(x)
x <- -f(x)/derivfunc(x) + x
iter <- iter+1
}
x
}
A Crash Course in R (and other notes) – p. 50/87
Example code
> sinfunc <- function(x){ sin(x)*cos(x) }
> sinfunc.deriv <- function(y){ cos(y)*cos(y) - sin(y)*si n(y)}
> newtons.method(sinfunc, sinfunc.deriv, tol=1e-4, init =1)
[1] 1
[1] 2.09252
[1] 1.233947
[1] 1.633093
[1] 1.570472
[1] 1.570796
A Crash Course in R (and other notes) – p. 51/87
Unofficial assignment
Try out some of these functions in R
Play around with some other functions like seq , rnorm
Think about how we could have written the trimmedmean function to take a matrix and sort each column
A Crash Course in R (and other notes) – p. 52/87
for loops
for loops are used in functions to repeat similar tasks anumber of times
for loops in R are actually rather inefficient (shortstory: shopping for memory is like shopping for toiletpaper - better to buy in bulk)
If necessary, then they are simple to implement
R for loops are actually rather intuitive
A Crash Course in R (and other notes) – p. 53/87
for loops
Structure/syntax of a for loop
for ( counter in vector ) {looped line 1looped line 2
...}
The function will perform the list of looped lineslength( vector) times
The counter is a variable that is often used in the codefor calculations or display
A Crash Course in R (and other notes) – p. 54/87
Example: Convolving two finite sequences
Convolutions pop up in certain statistical (andmathematical) computations
Imagine that we have two sequences: a0, a1, ..., am−1
and b0, b1, ..., bn−1 and we want to find their convolution:ab0, ab1, ..., abn+m−2 where
abk =k∑
i=0
ai ∗ bk−i
for valid indices i and k − i
A Crash Course in R (and other notes) – p. 55/87
convolve.rfun function
What are the necessary arguments?
What should be returned from the function?
A Crash Course in R (and other notes) – p. 56/87
convolve.rfun function
Arguments: a data vector, b data vector
Should return a vector of length length( a) +length( b) -1
How do we calculate the convolution
A Crash Course in R (and other notes) – p. 57/87
convolve.rfun function
convolve.rfun<-function(a,b){
ab<-rep(0,length(a)+length(b) -1)
for(i in 1:length(a)){
for(j in 1:length(b)){
ab[i+j-1] <- ab[i+j-1] + a[i]*b[j]
}
}
ab
}
A Crash Course in R (and other notes) – p. 58/87
convolve.rfun function
> source("convolve.R")
> args(convolve.rfun)
function (a, b)
NULL
> testavec<-c(1:10)
> testbvec<-c(1:10)
> convolve.rfun(testavec,testbvec)
[1] 1 4 10 20 35 56 84 120 165 220 264 296 315
[14] 320 310 284 241 180 100
A Crash Course in R (and other notes) – p. 59/87
The big problem
Our new convolution function works pretty well for smallsequences
The problem is that R is absolutely horrid with loops (asare most interpreted languages)
In fact, if we look at even a moderate sequence, we seethat R can take a long time
A Crash Course in R (and other notes) – p. 60/87
Loops and speed issues in R
> testavec<-c(1:700)
> testbvec<-c(1:700)
> system.time(resultvec<-convolve.rfun(testavec,test bvec))
[1] 20.02 0.00 20.15 0.00 0.00
A Crash Course in R (and other notes) – p. 61/87
Loops and speed issues in R
We see from the system.time function that ourconvolution function takes about 20 seconds to run
This was not a very long pair of vectors to convolve,only 700 components each in length, and the functionitself is trivial to write
We have one of two choices: we can either try andcome up with sneaky R shortcuts to speed up thefunction (difficult to do in this case) or output a lot of theheavy looping to a C function
A Crash Course in R (and other notes) – p. 62/87
if/else statements
if statements allow you to control when certainstatements will be executed in your function
if statements follow the following form:if( condition ) {
calculation 1calculation 2...
}
The function will only perform the calculations in the {}’s if the condition is true
A Crash Course in R (and other notes) – p. 63/87
if/else statements
If you add an else state after your if condition }, thenif condition is false, then the function will execute whatis in the else brackets
if( condition ) { if calculation }
else { else calculation }
A Crash Course in R (and other notes) – p. 64/87
running.mean.fast function
We’ll create a new function, running.mean.fast , thatwill calculate a running mean with a given window
Instead of recalculating the mean of all the observationseach time we change the center of the window , we canjust adjust the previous mean by dropping oneobservation and adding one observation
The most sticky part (again) will be the handling of thebeginning and end of the vector
Helps if we divide it into three cases: beginning, middle,end
A Crash Course in R (and other notes) – p. 65/87
Dividing into cases
Beginning: Add observations to the total sum and addone to the denominator
Middle: Add one observation to the end, subtract onefrom the beginning, and the denominator stays constant
End: Subtract observations from the total sum andsubtract one from the denominator.
A Crash Course in R (and other notes) – p. 66/87
running.mean.fast function
running.mean.fast<-function( datavector, windowsize=3 ){
resultvector<-rep( 0, length(datavector) )
k <- round((windowsize-1)/2)
#get index corresponding to windowsize
#Set initial sum
#for(i in 1:length(datavector)){
# if(beginning) { do beginning calcs}
#
# else {
# if(middle) { do middle calcs }
# else { do end calcs }
# }
#}
resultvector
}
A Crash Course in R (and other notes) – p. 67/87
running.mean.fast function
currentsum <- sum(datavector[1:k])
for(i in 1:length(datavector)){
#Beginning
if( i <= (k+1) ) {
currentsum <- currentsum+datavector[i+k]
resultvector[i] <- currentsum/(i+k)
}
else{
#Middle
if((i>(k+1))&(i<=(length(datavector)-k))){
currentsum <- currentsum+datavector[i+k]-datavector[i -k-1]
resultvector[i] <- currentsum/(2*k+1)
}
else{
#End
currentsum <- currentsum-datavector[i-(k+1)]
resultvector[i] <- currentsum/(length(datavector)-i+1 +k)
} } }
resultvector}
A Crash Course in R (and other notes) – p. 68/87
Data frames
A data frame is much like a matrix, only more flexible
Allows the user to name columns, which makesoperations, such as using the regression functions lmand glm , easier
User can also attach data frames, which gives eveneasier access to information
The easiest way to read tabular data into R is throughread.table , which returns a data frame
A Crash Course in R (and other notes) – p. 69/87
Example data frame: Ships data
Description from R help file:Data frame giving the number of damageincidents and aggregate months of serviceby ship type, year of construction, andperiod of operation.
Using commands
> library(MASS)
> data(ships)
makes the frame available
A Crash Course in R (and other notes) – p. 70/87
Example data frame: Ships data
To get the names of the variables in the ships frame,use the names function> names(ships)
[1] "type" "year" "period" "service" "incidents"
Typing ships $’’year’’ accesses just the yearvariable
> ships$year
[1] 60 60 65 65 70 70 75 75 60 60 65 65 ....
[26] 60 65 65 70 70 75 75 60 60 65 65 70 ....
A Crash Course in R (and other notes) – p. 71/87
Example data frame: ships data
The variables themselves act like vectors and the dataframes act like a matrix
> ships$service[25:30]
[1] 251 105 288 192 349 1208
> ships[5,]
> type year period service incidents
5 A 70 60 1512 6
> ships[5,4]
[1] 1512
> ships[25:30,1:3]
type year period
25 D 60 60
26 D 60 75
27 D 65 60
28 D 65 75
29 D 70 60
30 D 70 75
A Crash Course in R (and other notes) – p. 72/87
Lists
Lists are different from data frames in that the differentparts of a list can have different numbers of rows, butyou create data frames and lists from scratch in asimilar way:
> data.frame(x1=c(45,12,13), x2=c(12))
> x1 x2
1 45 12
2 12 12
3 13 12
> list(x1=c(45,12,13), x2=c(12))
$x1
[1] 45 12 13
$x2
[1] 12
A Crash Course in R (and other notes) – p. 73/87
Lists and data frames
Notice how the data frame and the list have the samenames for the variables (or parts), but the list allows fordiffering dimensions of the two variables (or parts)
Data frames are better used for data (duh) and lists arebetter used for complicated data structures returned byfunctions (e.g. lm returns a list that contains residuals,coefficients, etc., all of differing dimension)
A Crash Course in R (and other notes) – p. 74/87
Lists and data frames
> names(lm(incidents˜year, data=ships))
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"
[9] "xlevels" "call" "terms" "model"
> is.list(lm(incidents˜year,data=ships))
[1] TRUE
> is.data.frame(lm(incidents˜year,data=ships))
[1] FALSE
A Crash Course in R (and other notes) – p. 75/87
Plots
Plotting can be done at two levels
The first level is simple to use, but difficult to configure
The second level is very difficult to use, but allows youto control just about everything
Try to avoid throwing your keyboard on the floor whilestruggling to get used to the syntax
A Crash Course in R (and other notes) – p. 76/87
Basicplot command
Typing plot( object ) will produce a variety ofresults, depending on what you give the plot command
If you give it a vector, it plots each value vs. its vectorindex
If you give it a frame, it defaults to the pairs command,which prints out all bivariate scatterplots of all pairs ofvariables
If you give it a linear model object, it will output fourplots (residual vs. fitted, q-q plot, fitted vs. std.residuals, and Cook’s distance)
A Crash Course in R (and other notes) – p. 77/87
Examples of scatterplots/function plots
x<-runif(1000,-3,3)
y<- x + 3 + rnorm(1000,0,1)
plot(x,y)
−3 −2 −1 0 1 2 3
−2
02
46
8
x
y
A Crash Course in R (and other notes) – p. 78/87
Example 2
plot(x, y, pch = ’.’ )
−3 −2 −1 0 1 2 3
−2
02
46
8
x
y
A Crash Course in R (and other notes) – p. 79/87
Example 3
w<-seq(-3,3,length=1000)
plot(w, dnorm(w), type=’l’)
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
w
dnor
m(w
)
A Crash Course in R (and other notes) – p. 80/87
Titles, line types, labels, axes, etc.
> plot(w,dnorm(w), main=’Your title goes here’, type=’l’, lty=2)
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
Your title goes here
w
dnor
m(w
)
A Crash Course in R (and other notes) – p. 81/87
Titles, line types, labels, axes, etc.
> plot(w, dnorm(w), type=’l’, main=’Title’, xlab=’X-axis label’,
ylab=’Y-axis label’)
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
Title
X−axis label
Y−
axis
labe
l
A Crash Course in R (and other notes) – p. 82/87
Titles, line types, labels, axes, etc.
> plot(w, dnorm(w), type="l", main="Title",
ylim=c(0,.3), xlim=c(-3,3))
−3 −2 −1 0 1 2 3
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Title
w
dnor
m(w
)
A Crash Course in R (and other notes) – p. 83/87
Histograms
> histvec<-rpois(1000,5)
> hist(histvec, main="hist(histvec)")
> hist(histvec, main="hist(histvec, nclass=10)", nclass =10)
histvec, nclass=10
histvec
Fre
quen
cy
0 2 4 6 8 10 12 14
050
100
150
A Crash Course in R (and other notes) – p. 84/87
Arrays
Arrays are data structures that are best used for morethan 2-dimensional data, for instance, arrays of matrices
Array functions like apply will often work on matricesand data frames
For instance, if we use the MASS attitude dataset,then
> library(MASS)
> data(attitude)
> apply(attitude, 2, mean)
rating complaints privileges learning
64.63333 66.60000 53.13333 56.36667
raises critical advance
64.63333 74.76667 42.93333
A Crash Course in R (and other notes) – p. 85/87
Why useapply ?
apply should more efficiently perform the function(although the speed up is probably not as fast as youwould expect)
Also makes for cleaner code to cut down on looping>rdmmat<-matrix(rnorm(10000,4*(1:10)),10,1000)
>apply(rdmmat,1,mean)
[1] 4.003565 7.993293 11.983947 16.028533 19.939690 23.93 9630 27.981554
[8] 32.018374 36.047277 39.966829
> apply(rdmmat,1,var)
[1] 0.9376154 0.9554272 1.0340050 0.9289830 0.9847559 1.0 051780 0.9676526
[8] 1.0440103 1.0359232 1.0346870
A Crash Course in R (and other notes) – p. 86/87
That’s it!
If you need more help, feel free to contact the instructoror one of the TAs.
I would like to thank Professor Steele for generouslyproviding an extensive R tutorial that I unscrupulouslymassacred into this.
A Crash Course in R (and other notes) – p. 87/87
Recommended