Introduction to R software, by Leire ibaibarriaga

Preview:

DESCRIPTION

Key lecture for the EURO-BASIN Training Workshop on Introduction to Statistical Modelling for Habitat Model Development, 26-28 Oct, AZTI-Tecnalia, Pasaia, Spain (www.euro-basin.eu)

Citation preview

Introduc)on  to  Sta)s)cal  Modelling  Tools  for  Habitat  Models  Development,  26-­‐28th  Oct  2011  EURO-­‐BASIN,  www.euro-­‐basin.eu  

2

OUTLINE

• What is R?

• Installation

• First session in R

• Working directory

• Getting help in R

• Editors and GUIs for R

• Installing and updating packages

• Useful packages for habitat modelling

• Documentation

• R language: type of objects, functions to manipulate them, …

• Import/export data

• Plots in R

• Linear models, generalised linear models

• Programming in R

IN THIS TALK

FOR BACKGROUND(very introductory)

3

WHAT IS R?

• R is a language and environment for statistical computing and graphics.

• R provides a wide variety of statistical and graphical techniques, and is highlyextensible.

• R is a GNU project which is similar to the S language and environment which wasdeveloped at Bell Laboratories by John Chambers and colleagues. R is also knownas “GNU S”.

• R is completely free and it is available as Free Software under the terms of the FreeSoftware Foundation GNU General Public License in source code form.

• It compiles and runs on a wide variety of UNIX platforms and similar systems,Windows and MacOS.

• R is object-oriented.

• R is mostly command-line driven (although various graphical interfaces have beendeveloped).

• R has developed rapidly, and has been extended by a large collection of packages.

• Web page: www.r-project.org

4

INSTALLATION

• Sources, binaries and documentation for R can be obtained via CRAN, the “Comprehensive R Archive Network”

• For Windows:

Download the binary installer “R-2.13.2-win.exe”.

Just double-click on the icon and follow the instructions.

The default path is: “C:\Program Files\R\R-2.13.1”

5

FIRST SESSION IN R

• Ways to open a session in R:

1. If you double-click on the icon, “Rgui” (graphical user interface) will open

2. From a system window, execute “Rterm”

3. Open R from Tinn-R, Xemacs or similar.

6

HOW DOES IT LOOK LIKE?

R CONSOLE

7

THE R CONSOLE• > indicates that R is waiting a new command

• + indicates that the previous command was uncomplete and continues reading.

• Different commands are given in different lines or in the same line separated by ;

• Comments are written adding #. Everything after this symbol is not read by R

• R distinguises capital and lower case letters. “A” is not the same as “a”.

• Type of commands:

Expressions: the command is evaluated and printed on the screen. Nothing is saved

3+2 or sum(3,2)

Assignments: the command is evaluated and saved as an object using <-

Nothing is printed. Need to type the name of the object or

use the function print() to see it.

a <- 3+2

a

print(a)

8

WORKING DIRECTORY

• To know the working directory of the current session type: getwd()

• To change the working directory: setwd( whatever)

• Alternatively, execute R from the directory using a shortcut:

– Create a directory

– Right-click the R icon, go to “properties” and copy the “shortcut” path in “Start in”

• If we use Tinn-R, the default working directory is the one in which the R script issaved

• To save the current workspace use save.image() . By default the workspace will becalled “.Rdata”. We can specify a name usingsave.image(“myworkspace.Rdata”)

• To quit an R session, q()

• Be careful, in windows the paths should be given either as:

setwd("C:\\tmp\\Rcourse")

setwd("C:/tmp/Rcourse")

9

GETTING HELP

• ?sum

• help(sum)

• help("+")

• help.start()

• ?help.search

• help.search("linear models")

• ?apropos

• apropos("lm")

• ?demo

• demo(graphics); demo(persp)

10

EDITORS

• Useful to have a text editor that allows us to keep the code scripts (with comments,ordered, etc)

• Desirable properties for the text editor: syntax highlighting, checking parenthesis, etc,the code can be directly sent to R without (copy-paste)

• R for Windows has a small text editor. File/Open/New script. It links directly with R(Select code and Ctrl + R) but doesn’t allow syntax highlighting, etc.

• I use Tinn-R (only for Windows) http://sourceforge.net/projects/tinn-r

• It needs to run R in mode SDI and to install the packages R2HTML and SciViews. Itmight need to change the file Rprofile.site

• Other alternatives: Emacs/ESS, Rstudio, Vim, jEdit, JGR, Eclipse,

• See a complete list in: http://sciviews.org/_rgui/projects/Editors.html

11

GUIs for R

• The R command line interface (CLI) is powerful because it allows direct control on calculations and it is flexible. However, good knowledge of the language is required. The CLI is intimidating for beginners. The learning curve is typically longer than with a graphical user interface (GUI), although it is recognized that the effort is profitable and leads to better practice.

• Several projects are developping alternate user interfaces. See ongoing projects: http://sciviews.org/_rgui/

• An example: RCmdr http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/

12

PACKAGES

• All R functions and datasets are stored in packages. Only when a package is loaded are its contents available.

• To see which packages are installed at your site, issue the command

> library()

• To load a particular package called “mgcv” use a command like

> library(mgcv)

• To see which packages are currently loaded, use

> search()

• Help for a specific library:

library(help="mgcv")

help("mgcv-package")

• Detach the loaded library:

detach("package:mgcv")

13

PACKAGES

• The standard (or base) packages are considered part of the R source code. Theyshould be automatically available in any R installation.

• There are thousands of contributed packages for R, written by many differentauthors. Some of these packages implement specialized statistical methods, othersgive access to data or hardware, and others are designed to complement textbooks.Some (the recommended packages) are distributed with every binary distribution ofR. The rest packages should be downloaded individually from CRAN.

• The packages in CRAN can be installed and updated in two ways:

– From the Rgui menu

– From the R console using the commands:

install.packages(‘R2WinBUGS’)

update.packages(‘R2WinBUGS’)

14

USEFUL PACKAGES

• http://cran.r-project.org/web/views/Environmetrics.html

• http://cran.r-project.org/web/views/MachineLearning.html

• http://cran.r-project.org/web/views/Spatial.html

15

DOCUMENTATION

• R manuals

• FAQs, Wiki,…

• Reference cards

• R News

• R Journal

• A lot of material in the web, e.g.:

The R graph Gallery: http://addictedtor.free.fr/graphiques/

R bloggers: http://www.r-bloggers.com/

16

OBJECTS

• Everything (almost) in R is an object

• They are the entities that are created and saved in an R session

• They can be numbers, characters, functions, vectors, matrices, etc

• ls() or objects() show the objects created

• rm(a) removes and object called “a”.

17

TYPE OF OBJECTS

• Vector: unidimensional collection of elements of the same type (numbers, TRUE/FALSE, characters, …)

• Matrix: Bidimensional collection of elements of the same type

• Array: multidimensional collection of elements of the same type

• Data frame: like array, but allowing each column to be of different type

• Functions: code

• Factor: categorical vector

• List: a generalised vector. Each component can be of different type and can at the same time have its own components

18

NUMERIC VECTORS

• ?c ; ?rep ; ?seq• a <- c(1,2,3) # c: concatenate

• c(x,0,9,x)

• rep(1,3) # rep: repeat

• rep(a, each=3); rep(x=a, each=3)

• rep(a, times=3)

• rep( c(1,2), times=c(4,5))

• 1:7

• 7:1

• 2*1:5

• n <- 10; (1:n)-1; 1:n-1

• seq(-2,3,by=4) # seq: sequence

• seq(-2,3,length=4)

• seq(5)

18

19

ARITHMETIC OPERATORS

• + , - , * , / , ^, %%, % / %• log( ), exp( ), sqrt( ),• sin( ), cos( ), tan( ), abs( )

• ?Arithmetic• x <- 1:3

• 2*x

• 2^3

• (3*2)^2/log(4)

• sqrt(-1)

• log(10); log(10, base=10)

19

20

LOGICAL VECTORS

• A logival value can take the value: TRUE (T), FALSE (F) or NA (not available)

• a <- c(TRUE,FALSE,NA)

• b <- c(T,F,NA)

• rep(a,3)

20

21

LOGICAL OPERATORS

• <, >, = = , >= , <= , &, | , ! • ?Logic• ?Comparison• x <- c(-3,0,6)

• x > 0

• x>=0

• x>0

• x<0

• x<=0

• x==0

• !x==0

• x < 2 & x > -1 ; x < 2 | x > -1

• any(x < 2) ; any(x > -10) ; all(x < 2); all(x > -10)

21

22

MISSING VALUES

• NA (not available) x <- c(1, 2, NA, 4)

is.na(x)

sum(x); sum(x, na.rm=T)

• NaN (not a number) 0/0; Inf – Inf

3/0

x <- c(3, NA, NaN)

is.na(x); is.nan(x)

22

23

CHARACTER VECTORS

• The characters are defined by “ “

• \n new line, \t tab, \b white space

• c(“h”,”o”,”l”,”a”)

• paste(“h”,”o”,”l”,”a”)

• paste(“h”,”o”,”l”,”a”, sep=“”)

• paste(“x”,1:3, sep=“”)

• nchar(“hola”)

• substring(“hola”, 1:4, 1:4)

23

24

ACCESSING PART OF VECTOR

• Namevector [index]

• x <- seq(-1,7)

• y <- x <= 5

• x[1]; x[c(1,6)]; x[1:4]; x[c(2, 5:6)]

• x[y]; x[!y]; x[x > 0]

• x[-1]; x[-(3:4)]

• x[7] <- 0; x[3:4] <- c(11,9)

• x[y] <- NA

• is.na(x)

• x[is.na(x)] <- 0

24

25

FACTORS

• ?factor

• x <- c(rep(“blue”,2), “green”,rep(”red”,4))

• x

• x <- factor(x)

• x

• z <- factor(substring(“hola”,1:4,1:4),

levels=letters)

• z

• y <- factor(1:4)

• y

25

26

Matrices

• ?matrix

• matrix(1:20, ncol=4, nrow=5, byrow=T)

• a <- matrix(1:20, ncol=4, nrow=5, byrow=T)

• dim(a); nrow(a); ncol(a);

• a[1,4]

• a[2,]

• a[,3]

• t(a) # traspose

• cbind(1, c(3,2), c(4,7)) # column combine

• rbind(1, c(3,2), c(4,7)) # row combine

26

27

LISTS

• ?list

• a <- list(country=“China”, measurements=c(34,38,32), station=34)

• a$country; a$measurements; a$station

• a[1]

• a[[1]]

• a[[2]][1]

• names(a)

• length(a)

• dim(a)

27

28

DATA FRAME

• a <- data.frame(Long=rep(c(-3:-1), rep(11, 3)), Lat=rep(seq(43,48, by=0.5),3))

• names(a)

• dim(a)

• a$Long; a$Lat

• a[1,]; a[,1]

• a[a$Long==-1,]

28

29

FUNCTIONS

• Examples

ls()

a <- sum(1:6)

rm(a)

• General structure of a function:

name (arg1, arg2, arg3)

• The arguments can be given in order name (arg1, arg2, arg3) or byname name (arg2=x2, arg3=x3, arg1=x1)

30

READ R DATA FILES

• data()

• data(package=“nls”)

• ls()

• data(trees)

• ?trees

• ls()

• names(trees)

30

31

READ DATA FROM FILES

• ?read.table

• A <- read.table(“datos.txt”, header=T)

• A <- read.table(“C:/use/datos.txt”, header=T)

31

32

BASIC STATISTICS

• sum, mean, median, var, sd, quantile, min, max, range, sort, unique, summary

• data(iris)

• summary(iris)

• mean(iris$Sepal.Length)

• quantile(iris$Sepal.length, 0.25)

• quantile(iris$Sepal.length, seq(0,1,0.25))

• table(iris$Species) # contingency table

• tapply(iris$Sepal.Length, iris$Species, mean)

• cor(iris) # correlation matrix

32

33

HISTOGRAMS

• ?hist; ?barplot

• a <- rnorm(1000, 0, 1)

• hist(a)

• hist(a, breaks=10)

• hist(a, breaks=seq(-6,6))

• hist(a, breaks=10, prob=T)

• hist(a, breaks=10, prob=T, labels=T)

• hist(a, col=3)

• hist(a, border=4)

• b <- c(3,2,4,7,1,9)

• barplot(b)

33

34

STEM

• ?stem

• a <- rnorm(1000, 0, 1)

• stem(a)

• stem(a, scale=2)

34

35

BOXPLOTS

• ?boxplot

• b <- c(rep("A", 100), rep("B", 100), rep("C", 100), rep("D", 100), rep("E", 100))

• a <- rnorm(500)

• datos <- data.frame(a=a, b=b))

• boxplot(datos$a)

• boxplot(a ~ b, data=datos)

• boxplot(datos$a, notch=T)

• boxplot(datos$a, notch=T, col=2, border=4)

• boxplot(a ~ b, data=datos, col=1:5)

35

36

BOXPLOTS

• data(iris)

• names(iris)

• boxplot(iris$Petal.Length)

• boxplot(iris$Petal.Length, notch=T)

• boxplot(iris$Petal.Length, notch=T, col=2, border=4)

• boxplot(Petal.Length ~ Species, data=iris)

• boxplot(Petal.Length ~ Species, data=iris, col=2:4)

• boxplot(iris[,1:4], col=2:5)

36

37

QQPLOTS

• ?qqplot

• a <- rnorm(1000, 0, 1)

• qqnorm(a)

• qqline(a, col=2)

• data(precip)

• qqnorm(precip, ylab = "Precipitation [in/yr] for 70 US cities“, col=2)

37

38

CONDITIONED PLOTS

• ?plot.default; ?pairs; ?coplot

• pairs(iris)

• pairs(iris[, 1:4])

• pairs(iris[, 1:4], panel = panel.smooth, main = “Iris data")

• coplot(Petal.Width ~ Petal.Length |Species , data=iris, row=1)

• coplot(Petal.Width ~ Petal.Length |Sepal.Length, data=iris, row=1)

• coplot(Petal.Width ~ Petal.Length |Sepal.Length, given.values=co.intervals(Height, 3), data=iris, row=1)

38

39

PLOT

• ?plot; ?plot.default

• a <- rnorm(1000, 0, 1)

• plot(a)

• plot(a, type=“l”)

• plot(a, type=“h”)

• plot(a, type=“b”)

• plot(a, col=2); plot(a, col=“red”)

• plot(a, pch=“*”)

• plot(a, pch=2)

• plot(a, pch=3, cex=0.6)

• plot(a, pch=3, cex=0.6, col=6, xlab=“ ”, ylab=“a”, main=“Residuals”)

39

40

PLOT

• data(cars)

• plot(cars)

• plot(cars$speed, cars$dist)

• plot(cars$dist, cars$speed)

• plot(cars, type=“p”, pch=2, col=3, xlab=“Velocity”, ylab=“Distance”,

main=“About cars”, xlim=c(10,20), ylim=c(0,80))

• ?par

40

41

INTERACTING WITH FIGURES

• ?identify

• ?locator

• plot(cars)

• identify(cars, cars$speed)

• locator(1)

• locator(10)

41

42

ADD LINES AND POINTS

• ?lines; ?points; ?abline, ?text

• plot(cars, type=“n”)

• points(cars$speed[1:10], cars$dist[1:10], col=6)

• lines(cars)

• lines(cars$speed[1:10], cars$dist[1:10], col=4)

• lines(lowess(cars))

• abline(v=10)

• abline(h=40)

• abline(a=0, b=1)

• text(cars$speed, cars$dist, labels=cars$dist)

42

43

ADD LEGENDS

• ?legend

• boxplot(iris$Sepal.Length ~ iris$Species, col=2:4)

• legend(1, 8, levels(iris$Species), fill=2:4)

• boxplot(iris$Sepal.Length ~ iris$Species, col=2:4)

• loc <- locator(1)

• legend(loc, levels(iris$Species), fill=2:4)

43

44

LINEAR REGRESSION

• ?lm

• mod.lm <- lm(Petal.Width~Petal.Length, data=iris)

• mod2.lm <- lm(Petal.Width~Petal.Length - 1, data=iris)

• mod3.lm <- lm(Petal.Width~Petal.Length + Species, data=iris)

• mod4.lm <- lm(Petal.Width~Petal.Length * Species, data=iris)

• mod5.lm <- lm(Petal.Width~Petal.Length , subset=Species==“setosa”, data=iris)

44

45

LINEAR REGRESSION

• mod.lm

• summary(mod.lm)

• coef(mod.lm)

• residuals(mod.lm)

• predict(mod.lm)

• names(mod.lm)

• plot(iris$Petal.Length, iris$Petal.Width, type=“n”)

• points(iris$Petal.Length[iris$Species==“setosa”], iris$Petal.Width[iris$Species==“setosa”], col=2)

• abline(coef(mod.lm))

45

46

R PROGRAMMING

• If (condition) { instructions }

• If (condition) { instructions }

else {instructions}

• while (condition) { instructions }

• For (i in index) { instructions }

46

47

OWN FUNCTIONS

• Open a text editor to correct/create functions: fix(nombre)

• Estructure:function (arg1, arg2,arg3){

instructionsreturn(result)

}

47

Introduc)on  to  Sta)s)cal  Modelling  Tools  for  Habitat  Models  Development,  26-­‐28th  Oct  2011  EURO-­‐BASIN,  www.euro-­‐basin.eu  

Recommended