A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other...

Preview:

Citation preview

A Crash Course in R (and othernotes)

Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jerome Bergeron (2)

(1)Department of Mathematics and Statistics, McGill University

(2) Department of Statistics and Actuarial Sciences, University of Waterloo

A Crash Course in R (and other notes) – p. 1/87

Intro to statistical computing ideas

Statistical computation is a tool, not a proof

Computational results can be used to developintuition, but not to confirmComputational results are also valuable as a form ofdata analysis

A Crash Course in R (and other notes) – p. 2/87

Think before you program

More time is wasted on debugging badly written programsthan on anything else

Will this program be used again on another dataset or aspart of another program?

Will other people be using this code?

Can I comparmentalize the programming so tha t it’seasier to improve speed and/or efficiency withoutcompletely re-writing code?

Has anyone else created the tools that I need toaccomplish my task?

A Crash Course in R (and other notes) – p. 3/87

Programming bit by bit

Always have a small test dataset for which you knowthe correct answer

Always test the program as you write it

Try to keep things simple and specific at first, thengeneralize

Document, document, document

A Crash Course in R (and other notes) – p. 4/87

Write usable re-usable code

Find the balance between hardwiring inputs and usingarguments

Make sure you’re using the right platform for yourproblem

Write code that is abstract but still not annoyingly vague

A Crash Course in R (and other notes) – p. 5/87

Open discussion points

Need to familiarize yourself with the computingenvironment

You can get R for free fromhttp://cran.r-project.org/ (Windows and Linuxversions), upgrading to the newest version is a goodidea

If you wish to incorporate C code, you may also needGNU compilers (easy on Mac and Linux, more in thecoming weeks for Windows).

A Crash Course in R (and other notes) – p. 6/87

Text editor and other applications

VI and Emacs are two common editors for Mac OS Xand Linux/UNIX systems, although any text editor willdo

XEmacs is also available for Windows

On Mac OS X, X Code allows for editing and compiling,similar to BloodShed Dev

Eclipse, KDevelop and Anjuta are available for Linux

A Crash Course in R (and other notes) – p. 7/87

First ‘mission’

If you haven’t done so, download the R statisticalpackage and install on your own machine

Check to see if you the have necessary developmentsoftware installed on your machine

A Crash Course in R (and other notes) – p. 8/87

Basic R stuff

Today: Quick review of basic R stuff (large bore potential)

Vectors

Matrices

Functions

Useful commands

Frames, arrays, lists

A Crash Course in R (and other notes) – p. 9/87

The basic R files

Anytime you start R, the application looks for two files,.Rhistory and .RData

Each .RData workspace is different

Advantage: You can have several, separateworkspaces for functions, datasets, etc. that can betransferred from machine to machineDisadvantage: You can have several, separate...

A Crash Course in R (and other notes) – p. 10/87

The basic R files

For Unix/Linux platforms, if you are starting R in adirectory where these files do not exist, it will createnew ones for you

For Windows users, the workspaces can be saved inweird places (cause Windows is weird), use File ->Load Workspace to find where it is stored on yourparticular machine

Save workspaces on Windows using File -> SaveWorkspace ; on Unix/Linux rename the .RData file aftersaving and exiting R in order to keep from opening itagain the next time.

.RData files can be used on different platformsinterchangeably

A Crash Course in R (and other notes) – p. 11/87

Other fundamental R stuff

.Rhistory file keeps all commands for a givenworkspace; the history() command lets you accessthem.

ls() will tell you the contents of the currently loadedworkspace

library( libraryname) will allow you access thefunctions and data in the libraryname library

http://cran.r-project.org/ has a list of librariesand packages that one can download that are not partof the R base package

A Crash Course in R (and other notes) – p. 12/87

Vectors

Building blocks of R

Create a vector with the function c( ... )

Example:

> firstvec<-c(89,10,390.38)

Can access parts of a vector using subscripts

A Crash Course in R (and other notes) – p. 13/87

Vectors (cont)

By using vector name [i] where i is the ith position in thevector, we can either print:

> firstvec[2][1] 10

or change the contents of the vector:

> firstvec[3]<-271.401> firstvec[1] 89.000 10.000 271.401

A Crash Course in R (and other notes) – p. 14/87

Variables

When I talk about variables,just think about variables asvectors with one element

Example:

> firstvar<-5> firstvar[1] 5

Useful for indices and constants that you’ll need later on

A Crash Course in R (and other notes) – p. 15/87

Vector Operations

Using operations such as ’+’, ’-’, ’*’, ’/’ can be trickydepending on the lengths of the vectors you’re trying touse:

If the vectors are the same length, the operation isperformed elementwise

If one vector contains one element, then the operationis performed for each element of the vector

A Crash Course in R (and other notes) – p. 16/87

Vector Operations

> 5*firstvec[1] 445.000 50.000 1357.005> c(5)*firstvec[1] 445.000 50.000 1357.005> c(5,2)*firstvec[1] 445.000 20.000 1357.005Warning message:longer object lengthis not a multiple of shorterobject length in: c(5, 2) * firstvec> c(5,2,3)*firstvec[1] 445.000 20.000 814.203

A Crash Course in R (and other notes) – p. 17/87

More vectors

c(..) can be used to concatenate vectors as well

> secondvec<-c(65, 109, 109.80)> combinedvec<-c(firstvec, secondvec)> combinedvec[1] 89.000 10.000 271.401 65.000[5] 109.000 109.800

A Crash Course in R (and other notes) – p. 18/87

Matrix Operations

A matrix can be thought of as a collection of vectors

Using the matrix function, you can create a matrixfrom a single vector, filling the matrix from top tobottom, left to right:

> firstmatrix<-matrix(combinedvec,nrow=3, ncol=2)

> firstmatrix[,1] [,2]

[1,] 89.000 65.0[2,] 10.000 109.0[3,] 271.401 109.8

A Crash Course in R (and other notes) – p. 19/87

More matrices

Accessing/changing elements of a matrix requiresspecifying a row index and a column index

> firstmatrix[3,2][1] 109.8> firstmatrix[1,1]<-79> firstmatrix

[,1] [,2][1,] 79.000 65.0[2,] 10.000 109.0[3,] 271.401 109.8

A Crash Course in R (and other notes) – p. 20/87

Matrices

Accessing whole columns/rows can be done by leavingthe cumulative index blank

> firstmatrix[1,][1] 79 65> firstmatrix[,2][1] 65.0 109.0 109.8

A Crash Course in R (and other notes) – p. 21/87

Matrix operations

R defaults to elementwise operation for multiplicationand division of matrices too

A Crash Course in R (and other notes) – p. 22/87

Matrix operations

> matrixa<-matrix(c(1,2,3,4), ncol=2)

> matrixa

[,1] [,2]

[1,] 1 3

[2,] 2 4

> matrixb<-matrix(c(5,6,7,8), ncol=2)

> matrixb

[,1] [,2]

[1,] 5 7

[2,] 6 8

> matrixa*matrixb

[,1] [,2]

[1,] 5 21

[2,] 12 32

A Crash Course in R (and other notes) – p. 23/87

Matrix operations

You can use special operators (such as %*% ) to do thenormal matrix multiplication:

> matrixa%*%matrixb[,1] [,2]

[1,] 23 31[2,] 34 46

A Crash Course in R (and other notes) – p. 24/87

Matrix operations

To find the inverse of a square matrix, use the functionsolve .

> ainv<-solve(matrixa)> ainv

[,1] [,2][1,] -2 1.5[2,] 1 -0.5> ainv%*%matrixa

[,1] [,2][1,] 1 0[2,] 0 1

A Crash Course in R (and other notes) – p. 25/87

Functions

Functions can be thought of as shortcuts to performsequences of calculations

All functions take arguments and most useful functionsreturn objects

Functions can call other functions

A Crash Course in R (and other notes) – p. 26/87

Calling functions

We’ve already discussed a couple of functions c(...)and matrix(...)

c(...) takes any number of arguments and returns avector containing those values

matrix(...) takes a couple of different kinds ofarguments: the values for the matrix, the number ofrows, and the number of columns

Other examples of built-in R functions aremean(..) ,sum(...) ,var(...) ,lm(...)

A Crash Course in R (and other notes) – p. 27/87

Useful functions for, um, functions

args( function ) will return the possible argumentsfor a function

> args(matrix)function(data = NA, nrow = 1, ncol = 1,

byrow = FALSE, dimnames = NULL)

A Crash Course in R (and other notes) – p. 28/87

Useful functions (cont.)

help( function ) will return a description of whatthe function does, the arguments it takes, and thevalues it returns

matrix package:base

Matrices

Description:

‘matrix’ creates a matrix from

the given set of values. ‘as.matrix’

attempts to turn its argument into a

matrix. ‘is.matrix’ tests if its

argument is a (strict) matrix.

Usage:

.....

A Crash Course in R (and other notes) – p. 29/87

Writing functions

The easiest way to write functions in R is to create thefunction in an outside text editor such as emacs or vion Linux/Unix machines and Notepad under WindowsOR choose to create a script using the menu in R

A function declaration has a specific format

One must specify the name of the function, thearguments, and the body of the function

In R, the last calculation performed in the functiondetermines what the returned value is

A Crash Course in R (and other notes) – p. 30/87

Declaring functions

Either at the command line or in a separate file, write:functionname <- function( argument1,

argument2, ...){

function calculation 1function calculation 2....returned function calculation

}

A Crash Course in R (and other notes) – p. 31/87

First function

The first function we can write is a standard deviationfunction, stdev()

stdev() should take a vector as an argument andshould return the square root of the variance of thevector of observations

Here’s what we would write in our text file stdev.R :

A Crash Course in R (and other notes) – p. 32/87

stdev(...) definition

stdev<-function( datavector ){

sqrt( var( datavector ) )

}

A Crash Course in R (and other notes) – p. 33/87

Usingstdev() in R

Can use the source function to load your function intoR if not made in R

> source("[Source directory]/stdev.R")> stdevfunction( datavector ){

sqrt( var( datavector ) )

}

A Crash Course in R (and other notes) – p. 34/87

Usingstdev() in R

Then test out the function using a small dataset forwhich you know the answer

> testvector<-c(7,3,5)> var(testvector)[1] 4> sd(testvector)[1] 2> stdev(testvector)[1] 2

A Crash Course in R (and other notes) – p. 35/87

Useful function tips

Typing the function name with no ()’s (like with stdev )will print the function declaration to the screen (workseven with built-in R functions)

Remember to use source each time you makechanges to the text file that contains your function(s)

You can have multiple arguments for any given function

You can also set default values for arguments that takecertain values a majority of the time

A Crash Course in R (and other notes) – p. 36/87

Another example function

Let’s say that we want a function that calculates atrimmed mean

Remember that a trimmed mean takes an orderedvector of data and eliminates the first and last X% ofpoints

What arguments should the function take?

What built-in functions will we want to use?

A Crash Course in R (and other notes) – p. 37/87

Trimmed mean function

Arguments: Data vector (maybe take a matrix?),percentage (fixed or an argument)?

Built-in functions we’ll need: mean, sum, sort , length?

A Crash Course in R (and other notes) – p. 38/87

Step-by-step

Initial function declaration in trimmed.mean.R , just tomake sure the syntax is correct:

trimmed.mean<-function( datavector,trim.percent=5 ){

mean(datavector)

}

A Crash Course in R (and other notes) – p. 39/87

Step-by-step

## Generate 100 Exponential random variables

> trim.test.vector<-rexp(100,1)

# Look at the values: yours will be different

> trim.test.vector

[1] 1.68888770 0.16894626 0.37284041 0.27736393

....

> mean(trim.test.vector)

[1] 0.9427151

> source("trimmed.mean.R")

> trimmed.mean(trim.test.vector)

[1] 0.9427151

A Crash Course in R (and other notes) – p. 40/87

Step-by-step

Now we need to set up the trimming...

We want to eliminate the first X% and last X% of thesorted data before taking the mean

We’ll need a couple of R tricks to do this

length(...) will give us the length of a vector

Can we assume that the data vector is in the correctorder?

What are the indices of the data points that we want toinclude in the mean calculation?

A Crash Course in R (and other notes) – p. 41/87

Step-by-step

First, we can use the sort(...) function to sort thedata

Then, we want to drop the same number of points fromeach end of the data set, percent/100 *length(datavector)

So calculate dropnumber and then find the mean ofthe data vector without those 2*dropnumberobservations

A Crash Course in R (and other notes) – p. 42/87

Step-by-step

trimmed.mean<-function( datavector, trim.percent=5 ){

dropnumber<-round(trim.percent/100*length(datavecto r))

mean(sort(datavector)[(dropnumber+1):(length(datave ctor)-dropnumber)])

}

A Crash Course in R (and other notes) – p. 43/87

Taking it apart

Round to make sure it’s an integer

dropnumber<-round(trim.percent/100*length(datavecto r))}

sort sorts the incoming datavector and returns thesorted vector’a:b’ is a shortcut used to get a sequence of integersbetween a and bmean(datavector[(dropnumber+1):(length(datavector)- dropnumber)])

Think about what the indices are of the numbers thatwe want to get

A Crash Course in R (and other notes) – p. 44/87

Checking the results

> source("trimmed.mean.R")

> trimmed.mean(trim.test.vector)

[1] 0.813712

> trimmed.mean(trim.test.vector, trim.percent=10)

[1] 0.7287425

A Crash Course in R (and other notes) – p. 45/87

Why does it work?

sort(....) returns a vector containing the argumentvector’s sorted values

Because sort(...) returns a vector, we can use thebrackets [ ]’s to access part of the vector, i.e.sort(...)[i:j]

A Crash Course in R (and other notes) – p. 46/87

while loops

while loops are used to indefinitely repeat sets ofcalculations until a specified goal has been reached

The function first checks the condition; if true, itexecutes the statements in order; if false, it skipseverything and goes to the next statement outside theloop

The syntax for writing a while loop in R is:while( condition ) {

looped line 1

looped line 2

...

}

A Crash Course in R (and other notes) – p. 47/87

Example: Newton’s method for root-finding

The standard example for stopping conditions isNewton’s method for finding roots of equations in onevariable

Newton’s method uses the derivative of the function todetermine a linear direction towards a root, i.e. it cannotfind all roots of an equation (x s.t. f(x) = 0)

A Crash Course in R (and other notes) – p. 48/87

Example: Newton’s method

Arguments: function to be solved, derivative function,tolerance, initial value, maximum iterations

Return: root

What should the condition be?

Let xi be the “guessed” root at the ith iterationIf |f(xi)| > tolerance and the number of iterations isless than the maximum number of iterations, then itshould continue.Otherwise it should stop

A Crash Course in R (and other notes) – p. 49/87

Example: Newton’s method

newtons.method <- function( f, derivfunc,

tol=1e-3, init=0, maxiter=100){

x <- init

iter <- 1

while( (abs(f(x)) > tol) & (iter<=maxiter)){

print(x)

x <- -f(x)/derivfunc(x) + x

iter <- iter+1

}

x

}

A Crash Course in R (and other notes) – p. 50/87

Example code

> sinfunc <- function(x){ sin(x)*cos(x) }

> sinfunc.deriv <- function(y){ cos(y)*cos(y) - sin(y)*si n(y)}

> newtons.method(sinfunc, sinfunc.deriv, tol=1e-4, init =1)

[1] 1

[1] 2.09252

[1] 1.233947

[1] 1.633093

[1] 1.570472

[1] 1.570796

A Crash Course in R (and other notes) – p. 51/87

Unofficial assignment

Try out some of these functions in R

Play around with some other functions like seq , rnorm

Think about how we could have written the trimmedmean function to take a matrix and sort each column

A Crash Course in R (and other notes) – p. 52/87

for loops

for loops are used in functions to repeat similar tasks anumber of times

for loops in R are actually rather inefficient (shortstory: shopping for memory is like shopping for toiletpaper - better to buy in bulk)

If necessary, then they are simple to implement

R for loops are actually rather intuitive

A Crash Course in R (and other notes) – p. 53/87

for loops

Structure/syntax of a for loop

for ( counter in vector ) {looped line 1looped line 2

...}

The function will perform the list of looped lineslength( vector) times

The counter is a variable that is often used in the codefor calculations or display

A Crash Course in R (and other notes) – p. 54/87

Example: Convolving two finite sequences

Convolutions pop up in certain statistical (andmathematical) computations

Imagine that we have two sequences: a0, a1, ..., am−1

and b0, b1, ..., bn−1 and we want to find their convolution:ab0, ab1, ..., abn+m−2 where

abk =k∑

i=0

ai ∗ bk−i

for valid indices i and k − i

A Crash Course in R (and other notes) – p. 55/87

convolve.rfun function

What are the necessary arguments?

What should be returned from the function?

A Crash Course in R (and other notes) – p. 56/87

convolve.rfun function

Arguments: a data vector, b data vector

Should return a vector of length length( a) +length( b) -1

How do we calculate the convolution

A Crash Course in R (and other notes) – p. 57/87

convolve.rfun function

convolve.rfun<-function(a,b){

ab<-rep(0,length(a)+length(b) -1)

for(i in 1:length(a)){

for(j in 1:length(b)){

ab[i+j-1] <- ab[i+j-1] + a[i]*b[j]

}

}

ab

}

A Crash Course in R (and other notes) – p. 58/87

convolve.rfun function

> source("convolve.R")

> args(convolve.rfun)

function (a, b)

NULL

> testavec<-c(1:10)

> testbvec<-c(1:10)

> convolve.rfun(testavec,testbvec)

[1] 1 4 10 20 35 56 84 120 165 220 264 296 315

[14] 320 310 284 241 180 100

A Crash Course in R (and other notes) – p. 59/87

The big problem

Our new convolution function works pretty well for smallsequences

The problem is that R is absolutely horrid with loops (asare most interpreted languages)

In fact, if we look at even a moderate sequence, we seethat R can take a long time

A Crash Course in R (and other notes) – p. 60/87

Loops and speed issues in R

> testavec<-c(1:700)

> testbvec<-c(1:700)

> system.time(resultvec<-convolve.rfun(testavec,test bvec))

[1] 20.02 0.00 20.15 0.00 0.00

A Crash Course in R (and other notes) – p. 61/87

Loops and speed issues in R

We see from the system.time function that ourconvolution function takes about 20 seconds to run

This was not a very long pair of vectors to convolve,only 700 components each in length, and the functionitself is trivial to write

We have one of two choices: we can either try andcome up with sneaky R shortcuts to speed up thefunction (difficult to do in this case) or output a lot of theheavy looping to a C function

A Crash Course in R (and other notes) – p. 62/87

if/else statements

if statements allow you to control when certainstatements will be executed in your function

if statements follow the following form:if( condition ) {

calculation 1calculation 2...

}

The function will only perform the calculations in the {}’s if the condition is true

A Crash Course in R (and other notes) – p. 63/87

if/else statements

If you add an else state after your if condition }, thenif condition is false, then the function will execute whatis in the else brackets

if( condition ) { if calculation }

else { else calculation }

A Crash Course in R (and other notes) – p. 64/87

running.mean.fast function

We’ll create a new function, running.mean.fast , thatwill calculate a running mean with a given window

Instead of recalculating the mean of all the observationseach time we change the center of the window , we canjust adjust the previous mean by dropping oneobservation and adding one observation

The most sticky part (again) will be the handling of thebeginning and end of the vector

Helps if we divide it into three cases: beginning, middle,end

A Crash Course in R (and other notes) – p. 65/87

Dividing into cases

Beginning: Add observations to the total sum and addone to the denominator

Middle: Add one observation to the end, subtract onefrom the beginning, and the denominator stays constant

End: Subtract observations from the total sum andsubtract one from the denominator.

A Crash Course in R (and other notes) – p. 66/87

running.mean.fast function

running.mean.fast<-function( datavector, windowsize=3 ){

resultvector<-rep( 0, length(datavector) )

k <- round((windowsize-1)/2)

#get index corresponding to windowsize

#Set initial sum

#for(i in 1:length(datavector)){

# if(beginning) { do beginning calcs}

#

# else {

# if(middle) { do middle calcs }

# else { do end calcs }

# }

#}

resultvector

}

A Crash Course in R (and other notes) – p. 67/87

running.mean.fast function

currentsum <- sum(datavector[1:k])

for(i in 1:length(datavector)){

#Beginning

if( i <= (k+1) ) {

currentsum <- currentsum+datavector[i+k]

resultvector[i] <- currentsum/(i+k)

}

else{

#Middle

if((i>(k+1))&(i<=(length(datavector)-k))){

currentsum <- currentsum+datavector[i+k]-datavector[i -k-1]

resultvector[i] <- currentsum/(2*k+1)

}

else{

#End

currentsum <- currentsum-datavector[i-(k+1)]

resultvector[i] <- currentsum/(length(datavector)-i+1 +k)

} } }

resultvector}

A Crash Course in R (and other notes) – p. 68/87

Data frames

A data frame is much like a matrix, only more flexible

Allows the user to name columns, which makesoperations, such as using the regression functions lmand glm , easier

User can also attach data frames, which gives eveneasier access to information

The easiest way to read tabular data into R is throughread.table , which returns a data frame

A Crash Course in R (and other notes) – p. 69/87

Example data frame: Ships data

Description from R help file:Data frame giving the number of damageincidents and aggregate months of serviceby ship type, year of construction, andperiod of operation.

Using commands

> library(MASS)

> data(ships)

makes the frame available

A Crash Course in R (and other notes) – p. 70/87

Example data frame: Ships data

To get the names of the variables in the ships frame,use the names function> names(ships)

[1] "type" "year" "period" "service" "incidents"

Typing ships $’’year’’ accesses just the yearvariable

> ships$year

[1] 60 60 65 65 70 70 75 75 60 60 65 65 ....

[26] 60 65 65 70 70 75 75 60 60 65 65 70 ....

A Crash Course in R (and other notes) – p. 71/87

Example data frame: ships data

The variables themselves act like vectors and the dataframes act like a matrix

> ships$service[25:30]

[1] 251 105 288 192 349 1208

> ships[5,]

> type year period service incidents

5 A 70 60 1512 6

> ships[5,4]

[1] 1512

> ships[25:30,1:3]

type year period

25 D 60 60

26 D 60 75

27 D 65 60

28 D 65 75

29 D 70 60

30 D 70 75

A Crash Course in R (and other notes) – p. 72/87

Lists

Lists are different from data frames in that the differentparts of a list can have different numbers of rows, butyou create data frames and lists from scratch in asimilar way:

> data.frame(x1=c(45,12,13), x2=c(12))

> x1 x2

1 45 12

2 12 12

3 13 12

> list(x1=c(45,12,13), x2=c(12))

$x1

[1] 45 12 13

$x2

[1] 12

A Crash Course in R (and other notes) – p. 73/87

Lists and data frames

Notice how the data frame and the list have the samenames for the variables (or parts), but the list allows fordiffering dimensions of the two variables (or parts)

Data frames are better used for data (duh) and lists arebetter used for complicated data structures returned byfunctions (e.g. lm returns a list that contains residuals,coefficients, etc., all of differing dimension)

A Crash Course in R (and other notes) – p. 74/87

Lists and data frames

> names(lm(incidents˜year, data=ships))

[1] "coefficients" "residuals" "effects" "rank"

[5] "fitted.values" "assign" "qr" "df.residual"

[9] "xlevels" "call" "terms" "model"

> is.list(lm(incidents˜year,data=ships))

[1] TRUE

> is.data.frame(lm(incidents˜year,data=ships))

[1] FALSE

A Crash Course in R (and other notes) – p. 75/87

Plots

Plotting can be done at two levels

The first level is simple to use, but difficult to configure

The second level is very difficult to use, but allows youto control just about everything

Try to avoid throwing your keyboard on the floor whilestruggling to get used to the syntax

A Crash Course in R (and other notes) – p. 76/87

Basicplot command

Typing plot( object ) will produce a variety ofresults, depending on what you give the plot command

If you give it a vector, it plots each value vs. its vectorindex

If you give it a frame, it defaults to the pairs command,which prints out all bivariate scatterplots of all pairs ofvariables

If you give it a linear model object, it will output fourplots (residual vs. fitted, q-q plot, fitted vs. std.residuals, and Cook’s distance)

A Crash Course in R (and other notes) – p. 77/87

Examples of scatterplots/function plots

x<-runif(1000,-3,3)

y<- x + 3 + rnorm(1000,0,1)

plot(x,y)

−3 −2 −1 0 1 2 3

−2

02

46

8

x

y

A Crash Course in R (and other notes) – p. 78/87

Example 2

plot(x, y, pch = ’.’ )

−3 −2 −1 0 1 2 3

−2

02

46

8

x

y

A Crash Course in R (and other notes) – p. 79/87

Example 3

w<-seq(-3,3,length=1000)

plot(w, dnorm(w), type=’l’)

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

w

dnor

m(w

)

A Crash Course in R (and other notes) – p. 80/87

Titles, line types, labels, axes, etc.

> plot(w,dnorm(w), main=’Your title goes here’, type=’l’, lty=2)

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Your title goes here

w

dnor

m(w

)

A Crash Course in R (and other notes) – p. 81/87

Titles, line types, labels, axes, etc.

> plot(w, dnorm(w), type=’l’, main=’Title’, xlab=’X-axis label’,

ylab=’Y-axis label’)

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Title

X−axis label

Y−

axis

labe

l

A Crash Course in R (and other notes) – p. 82/87

Titles, line types, labels, axes, etc.

> plot(w, dnorm(w), type="l", main="Title",

ylim=c(0,.3), xlim=c(-3,3))

−3 −2 −1 0 1 2 3

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Title

w

dnor

m(w

)

A Crash Course in R (and other notes) – p. 83/87

Histograms

> histvec<-rpois(1000,5)

> hist(histvec, main="hist(histvec)")

> hist(histvec, main="hist(histvec, nclass=10)", nclass =10)

histvec, nclass=10

histvec

Fre

quen

cy

0 2 4 6 8 10 12 14

050

100

150

A Crash Course in R (and other notes) – p. 84/87

Arrays

Arrays are data structures that are best used for morethan 2-dimensional data, for instance, arrays of matrices

Array functions like apply will often work on matricesand data frames

For instance, if we use the MASS attitude dataset,then

> library(MASS)

> data(attitude)

> apply(attitude, 2, mean)

rating complaints privileges learning

64.63333 66.60000 53.13333 56.36667

raises critical advance

64.63333 74.76667 42.93333

A Crash Course in R (and other notes) – p. 85/87

Why useapply ?

apply should more efficiently perform the function(although the speed up is probably not as fast as youwould expect)

Also makes for cleaner code to cut down on looping>rdmmat<-matrix(rnorm(10000,4*(1:10)),10,1000)

>apply(rdmmat,1,mean)

[1] 4.003565 7.993293 11.983947 16.028533 19.939690 23.93 9630 27.981554

[8] 32.018374 36.047277 39.966829

> apply(rdmmat,1,var)

[1] 0.9376154 0.9554272 1.0340050 0.9289830 0.9847559 1.0 051780 0.9676526

[8] 1.0440103 1.0359232 1.0346870

A Crash Course in R (and other notes) – p. 86/87

That’s it!

If you need more help, feel free to contact the instructoror one of the TAs.

I would like to thank Professor Steele for generouslyproviding an extensive R tutorial that I unscrupulouslymassacred into this.

A Crash Course in R (and other notes) – p. 87/87