87
A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-J ´ er ˆ ome Bergeron (2) (1)Department of Mathematics and Statistics, McGill University (2) Department of Statistics and Actuarial Sciences, University of Waterloo A Crash Course in R (and other notes) – p. 1/8

A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

A Crash Course in R (and othernotes)

Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jerome Bergeron (2)

(1)Department of Mathematics and Statistics, McGill University

(2) Department of Statistics and Actuarial Sciences, University of Waterloo

A Crash Course in R (and other notes) – p. 1/87

Page 2: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Intro to statistical computing ideas

Statistical computation is a tool, not a proof

Computational results can be used to developintuition, but not to confirmComputational results are also valuable as a form ofdata analysis

A Crash Course in R (and other notes) – p. 2/87

Page 3: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Think before you program

More time is wasted on debugging badly written programsthan on anything else

Will this program be used again on another dataset or aspart of another program?

Will other people be using this code?

Can I comparmentalize the programming so tha t it’seasier to improve speed and/or efficiency withoutcompletely re-writing code?

Has anyone else created the tools that I need toaccomplish my task?

A Crash Course in R (and other notes) – p. 3/87

Page 4: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Programming bit by bit

Always have a small test dataset for which you knowthe correct answer

Always test the program as you write it

Try to keep things simple and specific at first, thengeneralize

Document, document, document

A Crash Course in R (and other notes) – p. 4/87

Page 5: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Write usable re-usable code

Find the balance between hardwiring inputs and usingarguments

Make sure you’re using the right platform for yourproblem

Write code that is abstract but still not annoyingly vague

A Crash Course in R (and other notes) – p. 5/87

Page 6: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Open discussion points

Need to familiarize yourself with the computingenvironment

You can get R for free fromhttp://cran.r-project.org/ (Windows and Linuxversions), upgrading to the newest version is a goodidea

If you wish to incorporate C code, you may also needGNU compilers (easy on Mac and Linux, more in thecoming weeks for Windows).

A Crash Course in R (and other notes) – p. 6/87

Page 7: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Text editor and other applications

VI and Emacs are two common editors for Mac OS Xand Linux/UNIX systems, although any text editor willdo

XEmacs is also available for Windows

On Mac OS X, X Code allows for editing and compiling,similar to BloodShed Dev

Eclipse, KDevelop and Anjuta are available for Linux

A Crash Course in R (and other notes) – p. 7/87

Page 8: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

First ‘mission’

If you haven’t done so, download the R statisticalpackage and install on your own machine

Check to see if you the have necessary developmentsoftware installed on your machine

A Crash Course in R (and other notes) – p. 8/87

Page 9: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Basic R stuff

Today: Quick review of basic R stuff (large bore potential)

Vectors

Matrices

Functions

Useful commands

Frames, arrays, lists

A Crash Course in R (and other notes) – p. 9/87

Page 10: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

The basic R files

Anytime you start R, the application looks for two files,.Rhistory and .RData

Each .RData workspace is different

Advantage: You can have several, separateworkspaces for functions, datasets, etc. that can betransferred from machine to machineDisadvantage: You can have several, separate...

A Crash Course in R (and other notes) – p. 10/87

Page 11: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

The basic R files

For Unix/Linux platforms, if you are starting R in adirectory where these files do not exist, it will createnew ones for you

For Windows users, the workspaces can be saved inweird places (cause Windows is weird), use File ->Load Workspace to find where it is stored on yourparticular machine

Save workspaces on Windows using File -> SaveWorkspace ; on Unix/Linux rename the .RData file aftersaving and exiting R in order to keep from opening itagain the next time.

.RData files can be used on different platformsinterchangeably

A Crash Course in R (and other notes) – p. 11/87

Page 12: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Other fundamental R stuff

.Rhistory file keeps all commands for a givenworkspace; the history() command lets you accessthem.

ls() will tell you the contents of the currently loadedworkspace

library( libraryname) will allow you access thefunctions and data in the libraryname library

http://cran.r-project.org/ has a list of librariesand packages that one can download that are not partof the R base package

A Crash Course in R (and other notes) – p. 12/87

Page 13: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Vectors

Building blocks of R

Create a vector with the function c( ... )

Example:

> firstvec<-c(89,10,390.38)

Can access parts of a vector using subscripts

A Crash Course in R (and other notes) – p. 13/87

Page 14: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Vectors (cont)

By using vector name [i] where i is the ith position in thevector, we can either print:

> firstvec[2][1] 10

or change the contents of the vector:

> firstvec[3]<-271.401> firstvec[1] 89.000 10.000 271.401

A Crash Course in R (and other notes) – p. 14/87

Page 15: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Variables

When I talk about variables,just think about variables asvectors with one element

Example:

> firstvar<-5> firstvar[1] 5

Useful for indices and constants that you’ll need later on

A Crash Course in R (and other notes) – p. 15/87

Page 16: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Vector Operations

Using operations such as ’+’, ’-’, ’*’, ’/’ can be trickydepending on the lengths of the vectors you’re trying touse:

If the vectors are the same length, the operation isperformed elementwise

If one vector contains one element, then the operationis performed for each element of the vector

A Crash Course in R (and other notes) – p. 16/87

Page 17: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Vector Operations

> 5*firstvec[1] 445.000 50.000 1357.005> c(5)*firstvec[1] 445.000 50.000 1357.005> c(5,2)*firstvec[1] 445.000 20.000 1357.005Warning message:longer object lengthis not a multiple of shorterobject length in: c(5, 2) * firstvec> c(5,2,3)*firstvec[1] 445.000 20.000 814.203

A Crash Course in R (and other notes) – p. 17/87

Page 18: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

More vectors

c(..) can be used to concatenate vectors as well

> secondvec<-c(65, 109, 109.80)> combinedvec<-c(firstvec, secondvec)> combinedvec[1] 89.000 10.000 271.401 65.000[5] 109.000 109.800

A Crash Course in R (and other notes) – p. 18/87

Page 19: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Matrix Operations

A matrix can be thought of as a collection of vectors

Using the matrix function, you can create a matrixfrom a single vector, filling the matrix from top tobottom, left to right:

> firstmatrix<-matrix(combinedvec,nrow=3, ncol=2)

> firstmatrix[,1] [,2]

[1,] 89.000 65.0[2,] 10.000 109.0[3,] 271.401 109.8

A Crash Course in R (and other notes) – p. 19/87

Page 20: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

More matrices

Accessing/changing elements of a matrix requiresspecifying a row index and a column index

> firstmatrix[3,2][1] 109.8> firstmatrix[1,1]<-79> firstmatrix

[,1] [,2][1,] 79.000 65.0[2,] 10.000 109.0[3,] 271.401 109.8

A Crash Course in R (and other notes) – p. 20/87

Page 21: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Matrices

Accessing whole columns/rows can be done by leavingthe cumulative index blank

> firstmatrix[1,][1] 79 65> firstmatrix[,2][1] 65.0 109.0 109.8

A Crash Course in R (and other notes) – p. 21/87

Page 22: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Matrix operations

R defaults to elementwise operation for multiplicationand division of matrices too

A Crash Course in R (and other notes) – p. 22/87

Page 23: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Matrix operations

> matrixa<-matrix(c(1,2,3,4), ncol=2)

> matrixa

[,1] [,2]

[1,] 1 3

[2,] 2 4

> matrixb<-matrix(c(5,6,7,8), ncol=2)

> matrixb

[,1] [,2]

[1,] 5 7

[2,] 6 8

> matrixa*matrixb

[,1] [,2]

[1,] 5 21

[2,] 12 32

A Crash Course in R (and other notes) – p. 23/87

Page 24: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Matrix operations

You can use special operators (such as %*% ) to do thenormal matrix multiplication:

> matrixa%*%matrixb[,1] [,2]

[1,] 23 31[2,] 34 46

A Crash Course in R (and other notes) – p. 24/87

Page 25: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Matrix operations

To find the inverse of a square matrix, use the functionsolve .

> ainv<-solve(matrixa)> ainv

[,1] [,2][1,] -2 1.5[2,] 1 -0.5> ainv%*%matrixa

[,1] [,2][1,] 1 0[2,] 0 1

A Crash Course in R (and other notes) – p. 25/87

Page 26: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Functions

Functions can be thought of as shortcuts to performsequences of calculations

All functions take arguments and most useful functionsreturn objects

Functions can call other functions

A Crash Course in R (and other notes) – p. 26/87

Page 27: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Calling functions

We’ve already discussed a couple of functions c(...)and matrix(...)

c(...) takes any number of arguments and returns avector containing those values

matrix(...) takes a couple of different kinds ofarguments: the values for the matrix, the number ofrows, and the number of columns

Other examples of built-in R functions aremean(..) ,sum(...) ,var(...) ,lm(...)

A Crash Course in R (and other notes) – p. 27/87

Page 28: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Useful functions for, um, functions

args( function ) will return the possible argumentsfor a function

> args(matrix)function(data = NA, nrow = 1, ncol = 1,

byrow = FALSE, dimnames = NULL)

A Crash Course in R (and other notes) – p. 28/87

Page 29: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Useful functions (cont.)

help( function ) will return a description of whatthe function does, the arguments it takes, and thevalues it returns

matrix package:base

Matrices

Description:

‘matrix’ creates a matrix from

the given set of values. ‘as.matrix’

attempts to turn its argument into a

matrix. ‘is.matrix’ tests if its

argument is a (strict) matrix.

Usage:

.....

A Crash Course in R (and other notes) – p. 29/87

Page 30: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Writing functions

The easiest way to write functions in R is to create thefunction in an outside text editor such as emacs or vion Linux/Unix machines and Notepad under WindowsOR choose to create a script using the menu in R

A function declaration has a specific format

One must specify the name of the function, thearguments, and the body of the function

In R, the last calculation performed in the functiondetermines what the returned value is

A Crash Course in R (and other notes) – p. 30/87

Page 31: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Declaring functions

Either at the command line or in a separate file, write:functionname <- function( argument1,

argument2, ...){

function calculation 1function calculation 2....returned function calculation

}

A Crash Course in R (and other notes) – p. 31/87

Page 32: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

First function

The first function we can write is a standard deviationfunction, stdev()

stdev() should take a vector as an argument andshould return the square root of the variance of thevector of observations

Here’s what we would write in our text file stdev.R :

A Crash Course in R (and other notes) – p. 32/87

Page 33: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

stdev(...) definition

stdev<-function( datavector ){

sqrt( var( datavector ) )

}

A Crash Course in R (and other notes) – p. 33/87

Page 34: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Usingstdev() in R

Can use the source function to load your function intoR if not made in R

> source("[Source directory]/stdev.R")> stdevfunction( datavector ){

sqrt( var( datavector ) )

}

A Crash Course in R (and other notes) – p. 34/87

Page 35: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Usingstdev() in R

Then test out the function using a small dataset forwhich you know the answer

> testvector<-c(7,3,5)> var(testvector)[1] 4> sd(testvector)[1] 2> stdev(testvector)[1] 2

A Crash Course in R (and other notes) – p. 35/87

Page 36: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Useful function tips

Typing the function name with no ()’s (like with stdev )will print the function declaration to the screen (workseven with built-in R functions)

Remember to use source each time you makechanges to the text file that contains your function(s)

You can have multiple arguments for any given function

You can also set default values for arguments that takecertain values a majority of the time

A Crash Course in R (and other notes) – p. 36/87

Page 37: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Another example function

Let’s say that we want a function that calculates atrimmed mean

Remember that a trimmed mean takes an orderedvector of data and eliminates the first and last X% ofpoints

What arguments should the function take?

What built-in functions will we want to use?

A Crash Course in R (and other notes) – p. 37/87

Page 38: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Trimmed mean function

Arguments: Data vector (maybe take a matrix?),percentage (fixed or an argument)?

Built-in functions we’ll need: mean, sum, sort , length?

A Crash Course in R (and other notes) – p. 38/87

Page 39: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Step-by-step

Initial function declaration in trimmed.mean.R , just tomake sure the syntax is correct:

trimmed.mean<-function( datavector,trim.percent=5 ){

mean(datavector)

}

A Crash Course in R (and other notes) – p. 39/87

Page 40: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Step-by-step

## Generate 100 Exponential random variables

> trim.test.vector<-rexp(100,1)

# Look at the values: yours will be different

> trim.test.vector

[1] 1.68888770 0.16894626 0.37284041 0.27736393

....

> mean(trim.test.vector)

[1] 0.9427151

> source("trimmed.mean.R")

> trimmed.mean(trim.test.vector)

[1] 0.9427151

A Crash Course in R (and other notes) – p. 40/87

Page 41: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Step-by-step

Now we need to set up the trimming...

We want to eliminate the first X% and last X% of thesorted data before taking the mean

We’ll need a couple of R tricks to do this

length(...) will give us the length of a vector

Can we assume that the data vector is in the correctorder?

What are the indices of the data points that we want toinclude in the mean calculation?

A Crash Course in R (and other notes) – p. 41/87

Page 42: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Step-by-step

First, we can use the sort(...) function to sort thedata

Then, we want to drop the same number of points fromeach end of the data set, percent/100 *length(datavector)

So calculate dropnumber and then find the mean ofthe data vector without those 2*dropnumberobservations

A Crash Course in R (and other notes) – p. 42/87

Page 43: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Step-by-step

trimmed.mean<-function( datavector, trim.percent=5 ){

dropnumber<-round(trim.percent/100*length(datavecto r))

mean(sort(datavector)[(dropnumber+1):(length(datave ctor)-dropnumber)])

}

A Crash Course in R (and other notes) – p. 43/87

Page 44: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Taking it apart

Round to make sure it’s an integer

dropnumber<-round(trim.percent/100*length(datavecto r))}

sort sorts the incoming datavector and returns thesorted vector’a:b’ is a shortcut used to get a sequence of integersbetween a and bmean(datavector[(dropnumber+1):(length(datavector)- dropnumber)])

Think about what the indices are of the numbers thatwe want to get

A Crash Course in R (and other notes) – p. 44/87

Page 45: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Checking the results

> source("trimmed.mean.R")

> trimmed.mean(trim.test.vector)

[1] 0.813712

> trimmed.mean(trim.test.vector, trim.percent=10)

[1] 0.7287425

A Crash Course in R (and other notes) – p. 45/87

Page 46: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Why does it work?

sort(....) returns a vector containing the argumentvector’s sorted values

Because sort(...) returns a vector, we can use thebrackets [ ]’s to access part of the vector, i.e.sort(...)[i:j]

A Crash Course in R (and other notes) – p. 46/87

Page 47: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

while loops

while loops are used to indefinitely repeat sets ofcalculations until a specified goal has been reached

The function first checks the condition; if true, itexecutes the statements in order; if false, it skipseverything and goes to the next statement outside theloop

The syntax for writing a while loop in R is:while( condition ) {

looped line 1

looped line 2

...

}

A Crash Course in R (and other notes) – p. 47/87

Page 48: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Example: Newton’s method for root-finding

The standard example for stopping conditions isNewton’s method for finding roots of equations in onevariable

Newton’s method uses the derivative of the function todetermine a linear direction towards a root, i.e. it cannotfind all roots of an equation (x s.t. f(x) = 0)

A Crash Course in R (and other notes) – p. 48/87

Page 49: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Example: Newton’s method

Arguments: function to be solved, derivative function,tolerance, initial value, maximum iterations

Return: root

What should the condition be?

Let xi be the “guessed” root at the ith iterationIf |f(xi)| > tolerance and the number of iterations isless than the maximum number of iterations, then itshould continue.Otherwise it should stop

A Crash Course in R (and other notes) – p. 49/87

Page 50: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Example: Newton’s method

newtons.method <- function( f, derivfunc,

tol=1e-3, init=0, maxiter=100){

x <- init

iter <- 1

while( (abs(f(x)) > tol) & (iter<=maxiter)){

print(x)

x <- -f(x)/derivfunc(x) + x

iter <- iter+1

}

x

}

A Crash Course in R (and other notes) – p. 50/87

Page 51: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Example code

> sinfunc <- function(x){ sin(x)*cos(x) }

> sinfunc.deriv <- function(y){ cos(y)*cos(y) - sin(y)*si n(y)}

> newtons.method(sinfunc, sinfunc.deriv, tol=1e-4, init =1)

[1] 1

[1] 2.09252

[1] 1.233947

[1] 1.633093

[1] 1.570472

[1] 1.570796

A Crash Course in R (and other notes) – p. 51/87

Page 52: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Unofficial assignment

Try out some of these functions in R

Play around with some other functions like seq , rnorm

Think about how we could have written the trimmedmean function to take a matrix and sort each column

A Crash Course in R (and other notes) – p. 52/87

Page 53: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

for loops

for loops are used in functions to repeat similar tasks anumber of times

for loops in R are actually rather inefficient (shortstory: shopping for memory is like shopping for toiletpaper - better to buy in bulk)

If necessary, then they are simple to implement

R for loops are actually rather intuitive

A Crash Course in R (and other notes) – p. 53/87

Page 54: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

for loops

Structure/syntax of a for loop

for ( counter in vector ) {looped line 1looped line 2

...}

The function will perform the list of looped lineslength( vector) times

The counter is a variable that is often used in the codefor calculations or display

A Crash Course in R (and other notes) – p. 54/87

Page 55: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Example: Convolving two finite sequences

Convolutions pop up in certain statistical (andmathematical) computations

Imagine that we have two sequences: a0, a1, ..., am−1

and b0, b1, ..., bn−1 and we want to find their convolution:ab0, ab1, ..., abn+m−2 where

abk =k∑

i=0

ai ∗ bk−i

for valid indices i and k − i

A Crash Course in R (and other notes) – p. 55/87

Page 56: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

convolve.rfun function

What are the necessary arguments?

What should be returned from the function?

A Crash Course in R (and other notes) – p. 56/87

Page 57: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

convolve.rfun function

Arguments: a data vector, b data vector

Should return a vector of length length( a) +length( b) -1

How do we calculate the convolution

A Crash Course in R (and other notes) – p. 57/87

Page 58: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

convolve.rfun function

convolve.rfun<-function(a,b){

ab<-rep(0,length(a)+length(b) -1)

for(i in 1:length(a)){

for(j in 1:length(b)){

ab[i+j-1] <- ab[i+j-1] + a[i]*b[j]

}

}

ab

}

A Crash Course in R (and other notes) – p. 58/87

Page 59: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

convolve.rfun function

> source("convolve.R")

> args(convolve.rfun)

function (a, b)

NULL

> testavec<-c(1:10)

> testbvec<-c(1:10)

> convolve.rfun(testavec,testbvec)

[1] 1 4 10 20 35 56 84 120 165 220 264 296 315

[14] 320 310 284 241 180 100

A Crash Course in R (and other notes) – p. 59/87

Page 60: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

The big problem

Our new convolution function works pretty well for smallsequences

The problem is that R is absolutely horrid with loops (asare most interpreted languages)

In fact, if we look at even a moderate sequence, we seethat R can take a long time

A Crash Course in R (and other notes) – p. 60/87

Page 61: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Loops and speed issues in R

> testavec<-c(1:700)

> testbvec<-c(1:700)

> system.time(resultvec<-convolve.rfun(testavec,test bvec))

[1] 20.02 0.00 20.15 0.00 0.00

A Crash Course in R (and other notes) – p. 61/87

Page 62: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Loops and speed issues in R

We see from the system.time function that ourconvolution function takes about 20 seconds to run

This was not a very long pair of vectors to convolve,only 700 components each in length, and the functionitself is trivial to write

We have one of two choices: we can either try andcome up with sneaky R shortcuts to speed up thefunction (difficult to do in this case) or output a lot of theheavy looping to a C function

A Crash Course in R (and other notes) – p. 62/87

Page 63: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

if/else statements

if statements allow you to control when certainstatements will be executed in your function

if statements follow the following form:if( condition ) {

calculation 1calculation 2...

}

The function will only perform the calculations in the {}’s if the condition is true

A Crash Course in R (and other notes) – p. 63/87

Page 64: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

if/else statements

If you add an else state after your if condition }, thenif condition is false, then the function will execute whatis in the else brackets

if( condition ) { if calculation }

else { else calculation }

A Crash Course in R (and other notes) – p. 64/87

Page 65: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

running.mean.fast function

We’ll create a new function, running.mean.fast , thatwill calculate a running mean with a given window

Instead of recalculating the mean of all the observationseach time we change the center of the window , we canjust adjust the previous mean by dropping oneobservation and adding one observation

The most sticky part (again) will be the handling of thebeginning and end of the vector

Helps if we divide it into three cases: beginning, middle,end

A Crash Course in R (and other notes) – p. 65/87

Page 66: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Dividing into cases

Beginning: Add observations to the total sum and addone to the denominator

Middle: Add one observation to the end, subtract onefrom the beginning, and the denominator stays constant

End: Subtract observations from the total sum andsubtract one from the denominator.

A Crash Course in R (and other notes) – p. 66/87

Page 67: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

running.mean.fast function

running.mean.fast<-function( datavector, windowsize=3 ){

resultvector<-rep( 0, length(datavector) )

k <- round((windowsize-1)/2)

#get index corresponding to windowsize

#Set initial sum

#for(i in 1:length(datavector)){

# if(beginning) { do beginning calcs}

#

# else {

# if(middle) { do middle calcs }

# else { do end calcs }

# }

#}

resultvector

}

A Crash Course in R (and other notes) – p. 67/87

Page 68: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

running.mean.fast function

currentsum <- sum(datavector[1:k])

for(i in 1:length(datavector)){

#Beginning

if( i <= (k+1) ) {

currentsum <- currentsum+datavector[i+k]

resultvector[i] <- currentsum/(i+k)

}

else{

#Middle

if((i>(k+1))&(i<=(length(datavector)-k))){

currentsum <- currentsum+datavector[i+k]-datavector[i -k-1]

resultvector[i] <- currentsum/(2*k+1)

}

else{

#End

currentsum <- currentsum-datavector[i-(k+1)]

resultvector[i] <- currentsum/(length(datavector)-i+1 +k)

} } }

resultvector}

A Crash Course in R (and other notes) – p. 68/87

Page 69: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Data frames

A data frame is much like a matrix, only more flexible

Allows the user to name columns, which makesoperations, such as using the regression functions lmand glm , easier

User can also attach data frames, which gives eveneasier access to information

The easiest way to read tabular data into R is throughread.table , which returns a data frame

A Crash Course in R (and other notes) – p. 69/87

Page 70: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Example data frame: Ships data

Description from R help file:Data frame giving the number of damageincidents and aggregate months of serviceby ship type, year of construction, andperiod of operation.

Using commands

> library(MASS)

> data(ships)

makes the frame available

A Crash Course in R (and other notes) – p. 70/87

Page 71: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Example data frame: Ships data

To get the names of the variables in the ships frame,use the names function> names(ships)

[1] "type" "year" "period" "service" "incidents"

Typing ships $’’year’’ accesses just the yearvariable

> ships$year

[1] 60 60 65 65 70 70 75 75 60 60 65 65 ....

[26] 60 65 65 70 70 75 75 60 60 65 65 70 ....

A Crash Course in R (and other notes) – p. 71/87

Page 72: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Example data frame: ships data

The variables themselves act like vectors and the dataframes act like a matrix

> ships$service[25:30]

[1] 251 105 288 192 349 1208

> ships[5,]

> type year period service incidents

5 A 70 60 1512 6

> ships[5,4]

[1] 1512

> ships[25:30,1:3]

type year period

25 D 60 60

26 D 60 75

27 D 65 60

28 D 65 75

29 D 70 60

30 D 70 75

A Crash Course in R (and other notes) – p. 72/87

Page 73: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Lists

Lists are different from data frames in that the differentparts of a list can have different numbers of rows, butyou create data frames and lists from scratch in asimilar way:

> data.frame(x1=c(45,12,13), x2=c(12))

> x1 x2

1 45 12

2 12 12

3 13 12

> list(x1=c(45,12,13), x2=c(12))

$x1

[1] 45 12 13

$x2

[1] 12

A Crash Course in R (and other notes) – p. 73/87

Page 74: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Lists and data frames

Notice how the data frame and the list have the samenames for the variables (or parts), but the list allows fordiffering dimensions of the two variables (or parts)

Data frames are better used for data (duh) and lists arebetter used for complicated data structures returned byfunctions (e.g. lm returns a list that contains residuals,coefficients, etc., all of differing dimension)

A Crash Course in R (and other notes) – p. 74/87

Page 75: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Lists and data frames

> names(lm(incidents˜year, data=ships))

[1] "coefficients" "residuals" "effects" "rank"

[5] "fitted.values" "assign" "qr" "df.residual"

[9] "xlevels" "call" "terms" "model"

> is.list(lm(incidents˜year,data=ships))

[1] TRUE

> is.data.frame(lm(incidents˜year,data=ships))

[1] FALSE

A Crash Course in R (and other notes) – p. 75/87

Page 76: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Plots

Plotting can be done at two levels

The first level is simple to use, but difficult to configure

The second level is very difficult to use, but allows youto control just about everything

Try to avoid throwing your keyboard on the floor whilestruggling to get used to the syntax

A Crash Course in R (and other notes) – p. 76/87

Page 77: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Basicplot command

Typing plot( object ) will produce a variety ofresults, depending on what you give the plot command

If you give it a vector, it plots each value vs. its vectorindex

If you give it a frame, it defaults to the pairs command,which prints out all bivariate scatterplots of all pairs ofvariables

If you give it a linear model object, it will output fourplots (residual vs. fitted, q-q plot, fitted vs. std.residuals, and Cook’s distance)

A Crash Course in R (and other notes) – p. 77/87

Page 78: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Examples of scatterplots/function plots

x<-runif(1000,-3,3)

y<- x + 3 + rnorm(1000,0,1)

plot(x,y)

−3 −2 −1 0 1 2 3

−2

02

46

8

x

y

A Crash Course in R (and other notes) – p. 78/87

Page 79: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Example 2

plot(x, y, pch = ’.’ )

−3 −2 −1 0 1 2 3

−2

02

46

8

x

y

A Crash Course in R (and other notes) – p. 79/87

Page 80: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Example 3

w<-seq(-3,3,length=1000)

plot(w, dnorm(w), type=’l’)

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

w

dnor

m(w

)

A Crash Course in R (and other notes) – p. 80/87

Page 81: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Titles, line types, labels, axes, etc.

> plot(w,dnorm(w), main=’Your title goes here’, type=’l’, lty=2)

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Your title goes here

w

dnor

m(w

)

A Crash Course in R (and other notes) – p. 81/87

Page 82: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Titles, line types, labels, axes, etc.

> plot(w, dnorm(w), type=’l’, main=’Title’, xlab=’X-axis label’,

ylab=’Y-axis label’)

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Title

X−axis label

Y−

axis

labe

l

A Crash Course in R (and other notes) – p. 82/87

Page 83: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Titles, line types, labels, axes, etc.

> plot(w, dnorm(w), type="l", main="Title",

ylim=c(0,.3), xlim=c(-3,3))

−3 −2 −1 0 1 2 3

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Title

w

dnor

m(w

)

A Crash Course in R (and other notes) – p. 83/87

Page 84: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Histograms

> histvec<-rpois(1000,5)

> hist(histvec, main="hist(histvec)")

> hist(histvec, main="hist(histvec, nclass=10)", nclass =10)

histvec, nclass=10

histvec

Fre

quen

cy

0 2 4 6 8 10 12 14

050

100

150

A Crash Course in R (and other notes) – p. 84/87

Page 85: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Arrays

Arrays are data structures that are best used for morethan 2-dimensional data, for instance, arrays of matrices

Array functions like apply will often work on matricesand data frames

For instance, if we use the MASS attitude dataset,then

> library(MASS)

> data(attitude)

> apply(attitude, 2, mean)

rating complaints privileges learning

64.63333 66.60000 53.13333 56.36667

raises critical advance

64.63333 74.76667 42.93333

A Crash Course in R (and other notes) – p. 85/87

Page 86: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

Why useapply ?

apply should more efficiently perform the function(although the speed up is probably not as fast as youwould expect)

Also makes for cleaner code to cut down on looping>rdmmat<-matrix(rnorm(10000,4*(1:10)),10,1000)

>apply(rdmmat,1,mean)

[1] 4.003565 7.993293 11.983947 16.028533 19.939690 23.93 9630 27.981554

[8] 32.018374 36.047277 39.966829

> apply(rdmmat,1,var)

[1] 0.9376154 0.9554272 1.0340050 0.9289830 0.9847559 1.0 051780 0.9676526

[8] 1.0440103 1.0359232 1.0346870

A Crash Course in R (and other notes) – p. 86/87

Page 87: A Crash Course in R (and other notes) - University of Ottawa · A Crash Course in R (and other notes) Professor Russell Steele (1), edited, with permission, by Dr. Pierre-Jer´ ome

That’s it!

If you need more help, feel free to contact the instructoror one of the TAs.

I would like to thank Professor Steele for generouslyproviding an extensive R tutorial that I unscrupulouslymassacred into this.

A Crash Course in R (and other notes) – p. 87/87