48
Introduc)on to Sta)s)cal Modelling Tools for Habitat Models Development, 2628 th Oct 2011 EUROBASIN, www.eurobasin.eu

Introduction to R software, by Leire ibaibarriaga

Embed Size (px)

DESCRIPTION

Key lecture for the EURO-BASIN Training Workshop on Introduction to Statistical Modelling for Habitat Model Development, 26-28 Oct, AZTI-Tecnalia, Pasaia, Spain (www.euro-basin.eu)

Citation preview

Page 1: Introduction to R software, by Leire ibaibarriaga

Introduc)on  to  Sta)s)cal  Modelling  Tools  for  Habitat  Models  Development,  26-­‐28th  Oct  2011  EURO-­‐BASIN,  www.euro-­‐basin.eu  

Page 2: Introduction to R software, by Leire ibaibarriaga

2

OUTLINE

• What is R?

• Installation

• First session in R

• Working directory

• Getting help in R

• Editors and GUIs for R

• Installing and updating packages

• Useful packages for habitat modelling

• Documentation

• R language: type of objects, functions to manipulate them, …

• Import/export data

• Plots in R

• Linear models, generalised linear models

• Programming in R

IN THIS TALK

FOR BACKGROUND(very introductory)

Page 3: Introduction to R software, by Leire ibaibarriaga

3

WHAT IS R?

• R is a language and environment for statistical computing and graphics.

• R provides a wide variety of statistical and graphical techniques, and is highlyextensible.

• R is a GNU project which is similar to the S language and environment which wasdeveloped at Bell Laboratories by John Chambers and colleagues. R is also knownas “GNU S”.

• R is completely free and it is available as Free Software under the terms of the FreeSoftware Foundation GNU General Public License in source code form.

• It compiles and runs on a wide variety of UNIX platforms and similar systems,Windows and MacOS.

• R is object-oriented.

• R is mostly command-line driven (although various graphical interfaces have beendeveloped).

• R has developed rapidly, and has been extended by a large collection of packages.

• Web page: www.r-project.org

Page 4: Introduction to R software, by Leire ibaibarriaga

4

INSTALLATION

• Sources, binaries and documentation for R can be obtained via CRAN, the “Comprehensive R Archive Network”

• For Windows:

Download the binary installer “R-2.13.2-win.exe”.

Just double-click on the icon and follow the instructions.

The default path is: “C:\Program Files\R\R-2.13.1”

Page 5: Introduction to R software, by Leire ibaibarriaga

5

FIRST SESSION IN R

• Ways to open a session in R:

1. If you double-click on the icon, “Rgui” (graphical user interface) will open

2. From a system window, execute “Rterm”

3. Open R from Tinn-R, Xemacs or similar.

Page 6: Introduction to R software, by Leire ibaibarriaga

6

HOW DOES IT LOOK LIKE?

R CONSOLE

Page 7: Introduction to R software, by Leire ibaibarriaga

7

THE R CONSOLE• > indicates that R is waiting a new command

• + indicates that the previous command was uncomplete and continues reading.

• Different commands are given in different lines or in the same line separated by ;

• Comments are written adding #. Everything after this symbol is not read by R

• R distinguises capital and lower case letters. “A” is not the same as “a”.

• Type of commands:

Expressions: the command is evaluated and printed on the screen. Nothing is saved

3+2 or sum(3,2)

Assignments: the command is evaluated and saved as an object using <-

Nothing is printed. Need to type the name of the object or

use the function print() to see it.

a <- 3+2

a

print(a)

Page 8: Introduction to R software, by Leire ibaibarriaga

8

WORKING DIRECTORY

• To know the working directory of the current session type: getwd()

• To change the working directory: setwd( whatever)

• Alternatively, execute R from the directory using a shortcut:

– Create a directory

– Right-click the R icon, go to “properties” and copy the “shortcut” path in “Start in”

• If we use Tinn-R, the default working directory is the one in which the R script issaved

• To save the current workspace use save.image() . By default the workspace will becalled “.Rdata”. We can specify a name usingsave.image(“myworkspace.Rdata”)

• To quit an R session, q()

• Be careful, in windows the paths should be given either as:

setwd("C:\\tmp\\Rcourse")

setwd("C:/tmp/Rcourse")

Page 9: Introduction to R software, by Leire ibaibarriaga

9

GETTING HELP

• ?sum

• help(sum)

• help("+")

• help.start()

• ?help.search

• help.search("linear models")

• ?apropos

• apropos("lm")

• ?demo

• demo(graphics); demo(persp)

Page 10: Introduction to R software, by Leire ibaibarriaga

10

EDITORS

• Useful to have a text editor that allows us to keep the code scripts (with comments,ordered, etc)

• Desirable properties for the text editor: syntax highlighting, checking parenthesis, etc,the code can be directly sent to R without (copy-paste)

• R for Windows has a small text editor. File/Open/New script. It links directly with R(Select code and Ctrl + R) but doesn’t allow syntax highlighting, etc.

• I use Tinn-R (only for Windows) http://sourceforge.net/projects/tinn-r

• It needs to run R in mode SDI and to install the packages R2HTML and SciViews. Itmight need to change the file Rprofile.site

• Other alternatives: Emacs/ESS, Rstudio, Vim, jEdit, JGR, Eclipse,

• See a complete list in: http://sciviews.org/_rgui/projects/Editors.html

Page 11: Introduction to R software, by Leire ibaibarriaga

11

GUIs for R

• The R command line interface (CLI) is powerful because it allows direct control on calculations and it is flexible. However, good knowledge of the language is required. The CLI is intimidating for beginners. The learning curve is typically longer than with a graphical user interface (GUI), although it is recognized that the effort is profitable and leads to better practice.

• Several projects are developping alternate user interfaces. See ongoing projects: http://sciviews.org/_rgui/

• An example: RCmdr http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/

Page 12: Introduction to R software, by Leire ibaibarriaga

12

PACKAGES

• All R functions and datasets are stored in packages. Only when a package is loaded are its contents available.

• To see which packages are installed at your site, issue the command

> library()

• To load a particular package called “mgcv” use a command like

> library(mgcv)

• To see which packages are currently loaded, use

> search()

• Help for a specific library:

library(help="mgcv")

help("mgcv-package")

• Detach the loaded library:

detach("package:mgcv")

Page 13: Introduction to R software, by Leire ibaibarriaga

13

PACKAGES

• The standard (or base) packages are considered part of the R source code. Theyshould be automatically available in any R installation.

• There are thousands of contributed packages for R, written by many differentauthors. Some of these packages implement specialized statistical methods, othersgive access to data or hardware, and others are designed to complement textbooks.Some (the recommended packages) are distributed with every binary distribution ofR. The rest packages should be downloaded individually from CRAN.

• The packages in CRAN can be installed and updated in two ways:

– From the Rgui menu

– From the R console using the commands:

install.packages(‘R2WinBUGS’)

update.packages(‘R2WinBUGS’)

Page 14: Introduction to R software, by Leire ibaibarriaga

14

USEFUL PACKAGES

• http://cran.r-project.org/web/views/Environmetrics.html

• http://cran.r-project.org/web/views/MachineLearning.html

• http://cran.r-project.org/web/views/Spatial.html

Page 15: Introduction to R software, by Leire ibaibarriaga

15

DOCUMENTATION

• R manuals

• FAQs, Wiki,…

• Reference cards

• R News

• R Journal

• A lot of material in the web, e.g.:

The R graph Gallery: http://addictedtor.free.fr/graphiques/

R bloggers: http://www.r-bloggers.com/

Page 16: Introduction to R software, by Leire ibaibarriaga

16

OBJECTS

• Everything (almost) in R is an object

• They are the entities that are created and saved in an R session

• They can be numbers, characters, functions, vectors, matrices, etc

• ls() or objects() show the objects created

• rm(a) removes and object called “a”.

Page 17: Introduction to R software, by Leire ibaibarriaga

17

TYPE OF OBJECTS

• Vector: unidimensional collection of elements of the same type (numbers, TRUE/FALSE, characters, …)

• Matrix: Bidimensional collection of elements of the same type

• Array: multidimensional collection of elements of the same type

• Data frame: like array, but allowing each column to be of different type

• Functions: code

• Factor: categorical vector

• List: a generalised vector. Each component can be of different type and can at the same time have its own components

Page 18: Introduction to R software, by Leire ibaibarriaga

18

NUMERIC VECTORS

• ?c ; ?rep ; ?seq• a <- c(1,2,3) # c: concatenate

• c(x,0,9,x)

• rep(1,3) # rep: repeat

• rep(a, each=3); rep(x=a, each=3)

• rep(a, times=3)

• rep( c(1,2), times=c(4,5))

• 1:7

• 7:1

• 2*1:5

• n <- 10; (1:n)-1; 1:n-1

• seq(-2,3,by=4) # seq: sequence

• seq(-2,3,length=4)

• seq(5)

18

Page 19: Introduction to R software, by Leire ibaibarriaga

19

ARITHMETIC OPERATORS

• + , - , * , / , ^, %%, % / %• log( ), exp( ), sqrt( ),• sin( ), cos( ), tan( ), abs( )

• ?Arithmetic• x <- 1:3

• 2*x

• 2^3

• (3*2)^2/log(4)

• sqrt(-1)

• log(10); log(10, base=10)

19

Page 20: Introduction to R software, by Leire ibaibarriaga

20

LOGICAL VECTORS

• A logival value can take the value: TRUE (T), FALSE (F) or NA (not available)

• a <- c(TRUE,FALSE,NA)

• b <- c(T,F,NA)

• rep(a,3)

20

Page 21: Introduction to R software, by Leire ibaibarriaga

21

LOGICAL OPERATORS

• <, >, = = , >= , <= , &, | , ! • ?Logic• ?Comparison• x <- c(-3,0,6)

• x > 0

• x>=0

• x>0

• x<0

• x<=0

• x==0

• !x==0

• x < 2 & x > -1 ; x < 2 | x > -1

• any(x < 2) ; any(x > -10) ; all(x < 2); all(x > -10)

21

Page 22: Introduction to R software, by Leire ibaibarriaga

22

MISSING VALUES

• NA (not available) x <- c(1, 2, NA, 4)

is.na(x)

sum(x); sum(x, na.rm=T)

• NaN (not a number) 0/0; Inf – Inf

3/0

x <- c(3, NA, NaN)

is.na(x); is.nan(x)

22

Page 23: Introduction to R software, by Leire ibaibarriaga

23

CHARACTER VECTORS

• The characters are defined by “ “

• \n new line, \t tab, \b white space

• c(“h”,”o”,”l”,”a”)

• paste(“h”,”o”,”l”,”a”)

• paste(“h”,”o”,”l”,”a”, sep=“”)

• paste(“x”,1:3, sep=“”)

• nchar(“hola”)

• substring(“hola”, 1:4, 1:4)

23

Page 24: Introduction to R software, by Leire ibaibarriaga

24

ACCESSING PART OF VECTOR

• Namevector [index]

• x <- seq(-1,7)

• y <- x <= 5

• x[1]; x[c(1,6)]; x[1:4]; x[c(2, 5:6)]

• x[y]; x[!y]; x[x > 0]

• x[-1]; x[-(3:4)]

• x[7] <- 0; x[3:4] <- c(11,9)

• x[y] <- NA

• is.na(x)

• x[is.na(x)] <- 0

24

Page 25: Introduction to R software, by Leire ibaibarriaga

25

FACTORS

• ?factor

• x <- c(rep(“blue”,2), “green”,rep(”red”,4))

• x

• x <- factor(x)

• x

• z <- factor(substring(“hola”,1:4,1:4),

levels=letters)

• z

• y <- factor(1:4)

• y

25

Page 26: Introduction to R software, by Leire ibaibarriaga

26

Matrices

• ?matrix

• matrix(1:20, ncol=4, nrow=5, byrow=T)

• a <- matrix(1:20, ncol=4, nrow=5, byrow=T)

• dim(a); nrow(a); ncol(a);

• a[1,4]

• a[2,]

• a[,3]

• t(a) # traspose

• cbind(1, c(3,2), c(4,7)) # column combine

• rbind(1, c(3,2), c(4,7)) # row combine

26

Page 27: Introduction to R software, by Leire ibaibarriaga

27

LISTS

• ?list

• a <- list(country=“China”, measurements=c(34,38,32), station=34)

• a$country; a$measurements; a$station

• a[1]

• a[[1]]

• a[[2]][1]

• names(a)

• length(a)

• dim(a)

27

Page 28: Introduction to R software, by Leire ibaibarriaga

28

DATA FRAME

• a <- data.frame(Long=rep(c(-3:-1), rep(11, 3)), Lat=rep(seq(43,48, by=0.5),3))

• names(a)

• dim(a)

• a$Long; a$Lat

• a[1,]; a[,1]

• a[a$Long==-1,]

28

Page 29: Introduction to R software, by Leire ibaibarriaga

29

FUNCTIONS

• Examples

ls()

a <- sum(1:6)

rm(a)

• General structure of a function:

name (arg1, arg2, arg3)

• The arguments can be given in order name (arg1, arg2, arg3) or byname name (arg2=x2, arg3=x3, arg1=x1)

Page 30: Introduction to R software, by Leire ibaibarriaga

30

READ R DATA FILES

• data()

• data(package=“nls”)

• ls()

• data(trees)

• ?trees

• ls()

• names(trees)

30

Page 31: Introduction to R software, by Leire ibaibarriaga

31

READ DATA FROM FILES

• ?read.table

• A <- read.table(“datos.txt”, header=T)

• A <- read.table(“C:/use/datos.txt”, header=T)

31

Page 32: Introduction to R software, by Leire ibaibarriaga

32

BASIC STATISTICS

• sum, mean, median, var, sd, quantile, min, max, range, sort, unique, summary

• data(iris)

• summary(iris)

• mean(iris$Sepal.Length)

• quantile(iris$Sepal.length, 0.25)

• quantile(iris$Sepal.length, seq(0,1,0.25))

• table(iris$Species) # contingency table

• tapply(iris$Sepal.Length, iris$Species, mean)

• cor(iris) # correlation matrix

32

Page 33: Introduction to R software, by Leire ibaibarriaga

33

HISTOGRAMS

• ?hist; ?barplot

• a <- rnorm(1000, 0, 1)

• hist(a)

• hist(a, breaks=10)

• hist(a, breaks=seq(-6,6))

• hist(a, breaks=10, prob=T)

• hist(a, breaks=10, prob=T, labels=T)

• hist(a, col=3)

• hist(a, border=4)

• b <- c(3,2,4,7,1,9)

• barplot(b)

33

Page 34: Introduction to R software, by Leire ibaibarriaga

34

STEM

• ?stem

• a <- rnorm(1000, 0, 1)

• stem(a)

• stem(a, scale=2)

34

Page 35: Introduction to R software, by Leire ibaibarriaga

35

BOXPLOTS

• ?boxplot

• b <- c(rep("A", 100), rep("B", 100), rep("C", 100), rep("D", 100), rep("E", 100))

• a <- rnorm(500)

• datos <- data.frame(a=a, b=b))

• boxplot(datos$a)

• boxplot(a ~ b, data=datos)

• boxplot(datos$a, notch=T)

• boxplot(datos$a, notch=T, col=2, border=4)

• boxplot(a ~ b, data=datos, col=1:5)

35

Page 36: Introduction to R software, by Leire ibaibarriaga

36

BOXPLOTS

• data(iris)

• names(iris)

• boxplot(iris$Petal.Length)

• boxplot(iris$Petal.Length, notch=T)

• boxplot(iris$Petal.Length, notch=T, col=2, border=4)

• boxplot(Petal.Length ~ Species, data=iris)

• boxplot(Petal.Length ~ Species, data=iris, col=2:4)

• boxplot(iris[,1:4], col=2:5)

36

Page 37: Introduction to R software, by Leire ibaibarriaga

37

QQPLOTS

• ?qqplot

• a <- rnorm(1000, 0, 1)

• qqnorm(a)

• qqline(a, col=2)

• data(precip)

• qqnorm(precip, ylab = "Precipitation [in/yr] for 70 US cities“, col=2)

37

Page 38: Introduction to R software, by Leire ibaibarriaga

38

CONDITIONED PLOTS

• ?plot.default; ?pairs; ?coplot

• pairs(iris)

• pairs(iris[, 1:4])

• pairs(iris[, 1:4], panel = panel.smooth, main = “Iris data")

• coplot(Petal.Width ~ Petal.Length |Species , data=iris, row=1)

• coplot(Petal.Width ~ Petal.Length |Sepal.Length, data=iris, row=1)

• coplot(Petal.Width ~ Petal.Length |Sepal.Length, given.values=co.intervals(Height, 3), data=iris, row=1)

38

Page 39: Introduction to R software, by Leire ibaibarriaga

39

PLOT

• ?plot; ?plot.default

• a <- rnorm(1000, 0, 1)

• plot(a)

• plot(a, type=“l”)

• plot(a, type=“h”)

• plot(a, type=“b”)

• plot(a, col=2); plot(a, col=“red”)

• plot(a, pch=“*”)

• plot(a, pch=2)

• plot(a, pch=3, cex=0.6)

• plot(a, pch=3, cex=0.6, col=6, xlab=“ ”, ylab=“a”, main=“Residuals”)

39

Page 40: Introduction to R software, by Leire ibaibarriaga

40

PLOT

• data(cars)

• plot(cars)

• plot(cars$speed, cars$dist)

• plot(cars$dist, cars$speed)

• plot(cars, type=“p”, pch=2, col=3, xlab=“Velocity”, ylab=“Distance”,

main=“About cars”, xlim=c(10,20), ylim=c(0,80))

• ?par

40

Page 41: Introduction to R software, by Leire ibaibarriaga

41

INTERACTING WITH FIGURES

• ?identify

• ?locator

• plot(cars)

• identify(cars, cars$speed)

• locator(1)

• locator(10)

41

Page 42: Introduction to R software, by Leire ibaibarriaga

42

ADD LINES AND POINTS

• ?lines; ?points; ?abline, ?text

• plot(cars, type=“n”)

• points(cars$speed[1:10], cars$dist[1:10], col=6)

• lines(cars)

• lines(cars$speed[1:10], cars$dist[1:10], col=4)

• lines(lowess(cars))

• abline(v=10)

• abline(h=40)

• abline(a=0, b=1)

• text(cars$speed, cars$dist, labels=cars$dist)

42

Page 43: Introduction to R software, by Leire ibaibarriaga

43

ADD LEGENDS

• ?legend

• boxplot(iris$Sepal.Length ~ iris$Species, col=2:4)

• legend(1, 8, levels(iris$Species), fill=2:4)

• boxplot(iris$Sepal.Length ~ iris$Species, col=2:4)

• loc <- locator(1)

• legend(loc, levels(iris$Species), fill=2:4)

43

Page 44: Introduction to R software, by Leire ibaibarriaga

44

LINEAR REGRESSION

• ?lm

• mod.lm <- lm(Petal.Width~Petal.Length, data=iris)

• mod2.lm <- lm(Petal.Width~Petal.Length - 1, data=iris)

• mod3.lm <- lm(Petal.Width~Petal.Length + Species, data=iris)

• mod4.lm <- lm(Petal.Width~Petal.Length * Species, data=iris)

• mod5.lm <- lm(Petal.Width~Petal.Length , subset=Species==“setosa”, data=iris)

44

Page 45: Introduction to R software, by Leire ibaibarriaga

45

LINEAR REGRESSION

• mod.lm

• summary(mod.lm)

• coef(mod.lm)

• residuals(mod.lm)

• predict(mod.lm)

• names(mod.lm)

• plot(iris$Petal.Length, iris$Petal.Width, type=“n”)

• points(iris$Petal.Length[iris$Species==“setosa”], iris$Petal.Width[iris$Species==“setosa”], col=2)

• abline(coef(mod.lm))

45

Page 46: Introduction to R software, by Leire ibaibarriaga

46

R PROGRAMMING

• If (condition) { instructions }

• If (condition) { instructions }

else {instructions}

• while (condition) { instructions }

• For (i in index) { instructions }

46

Page 47: Introduction to R software, by Leire ibaibarriaga

47

OWN FUNCTIONS

• Open a text editor to correct/create functions: fix(nombre)

• Estructure:function (arg1, arg2,arg3){

instructionsreturn(result)

}

47

Page 48: Introduction to R software, by Leire ibaibarriaga

Introduc)on  to  Sta)s)cal  Modelling  Tools  for  Habitat  Models  Development,  26-­‐28th  Oct  2011  EURO-­‐BASIN,  www.euro-­‐basin.eu