24
Introduction to R: Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Embed Size (px)

Citation preview

Page 1: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Overall Aims

• Introduce programming concepts relevant to MX

• Demonstrate the strengths (and weaknesses) of R

Page 2: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Books

• The R Book – Crawley (2007)

• Introductions to statistics using R– Cohen Y. and Cohen J. Y. (2008). Statistics and Data with R. – Crawley M. (2005). Statistics: An Introduction using R. – Dalgaard P. (2002). Introductory Statistics with R. – Maindonald J. & Braun J. (2003). Data Analysis and Graphics Using R: An Example-based Approach.

• Books on biological topics– Paradis E. (2006). Analysis of Phylogenetics and Evolution with R. – Broman K. W. & Sen S. (2009). A Guide to QTL Mapping with R/qtl. – Bolker B.M. (2008). Ecological Models and Data in R.

• Books on statistical topics– Aitkin M. et al. (2009). Statistical Modelling in R. – Faraway J. (2009). Linear Models with R. – Albert J. (2009). Bayesian Computation with R. – Bivand R.S. et al. (2009). Applied Spatial Data Analysis with R. – Cowpertwait P.S.P. & Metcalfe A.V. (2009). Introductory Time Series with R.

• Books on R specifics and R programming– Spector P. (2008). Data Manipulation with R. – Murrell P. (2006). R Graphics. – Chambers J. M. (2008). Software for Data Analysis: Programming with R.

Page 3: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Websites

• Websites:– Cran R: http://www.r-project.org/– R cookbook: http://www.r-cookbook.com/– R graphics: http://addictedtor.free.fr/graphiques/– R wiki: http://wiki.r-project.org/– Mailing lists: http://www.r-project.org/mail.html– R seek: http://www.rseek.org/

• Websites on statistical topics– R genetics: http://rgenetics.org/trac/rgalaxy– Bioconductor: http://www.bioconductor.org/

Page 4: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

The console

• Load up R• Console window appears, with a command prompt• Everything in the R console can be partitioned into two

fundamental operations:– Input variables

> x <- 2

– Output variables > x

[1] 2

Page 5: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Objects

• Names– Case sensitive, no spaces– Must begin with a letter but also can contain numbers and: . _– Try to give your objects meaningful names

> My_f4vourite.langua6e_evR <- “R”

• x, y and My_f4v… are objects that we have created > ls() # this will bring up a list of all our objects

> rm(y) # this deletes y (forever)

> rm(list=ls()) # this deletes everything (..forever)

Page 6: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Workspace 1

• Everything shown in this list of objects comprises our 'workspace'

> ls()[1] "My_f4vourite.langua6e_evR" "x" "y“> save.image(file=“myworkspace.RData”)

> rm(list=ls()) > ls() character(0) > load(file = “myworkspace.RData”) > ls()

[1] "My_f4vourite.langua6e_evR" "x" "y“

• Objects are internal to R– Does not behave like a file structure on the computer– Can't be read or interpreted outside R (?)

Page 7: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Workspace 2

• You can select which objects to save

> save(y, x, file = “two_objects.RData”)

• Different computer folders can be accessed

> dir() # shows current work directory

> setwd(“~/work_directory”) # sets R's focus to a different computer folder

Page 8: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Built-in functions

• Native functions make R succinct

• Diverse range available from graphics to data manipulation to statistical algorithms etc.

• Highly optimised so use them if they are available instead of writing your own

• Function structure:

> function_name(<argument 1>, <argument 2>, …)

Page 9: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Missing values

• NA is a “reserved” word in R

• It is a single element (length 1) that indicates a missing value

• A helpful alternative to coding missing values (e.g -99)

> my_array <- c(NA,100,120,120,120,130,NA)

> sum(my_array)

[1] NA

> sum(my_array,na.rm=T) # most functions allow you to explicitly state how to

handle NA

[1] 590

> table(my_array) # HOWEVER the default action varies from function to function

my_array

100 120 130

1 3 1

Page 10: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

R help pages

• Each function has its own unique syntax– Default arguments– Data structure requirements– Output options

> ?seq # brings up help page of seq() function > ??”sequence” # searches for all related functions

• Note > seq(from = 2, to = 100, by = 2)

is clearer than > seq(2,100,2)

Page 11: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Basic Scripting

• Note pad / text editor– Within the R GUI– Open with: File > New Script or Ctrl+N– Layout as tile is useful: Windows > Tile

Page 12: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Basic Scripting

• Note pad / text editor– Useful for keeping all work together– Scripts can be saved– Can be used to save a “program”– Add # comments

– Check individual bits of code– Ctrl+R

• Whole line• Selected code

Page 13: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Basic Scripting

• Brackets– ( ) functions– [ ] subsets– { } processes

• Subsets– Take a subset of an object– Objects have either 1 x n, or m x n dimensions

> x

[,1] [,2] [,3] [,4][1,] 1 4 7 10[2,] 2 5 8 11[3,] 3 6 9 12

> x

[1] 2 5 6 2 6 77 55 > x[5]

[1] 6

> X[3,4]

[1] 12

[rows, columns]

Page 14: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Basic Scripting

• Data input– Direct input into the console

• scan()

– Reading in data• read.table / read.csv

– “name.txt”– “c:\\temp\\name.txt”– choose.file()

– list.files()– dir()

> y <- scan()1: 32: 43: 124: 35: 56: 27: 148: Read 7 items

> dir() [1] "temp.csv" "temp2.csv" “name.txt”

> y <- read.table("name.txt", header=T, sep="\t")

>

Page 15: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Basic Scripting

• Data output– Direct input into the console

• sink()

– Writing out data• write.table / write.csv

– “name.txt”– “c:\\temp\\name.txt”

sink(“sink_tmp.txt”)

i <- 1:10

outer(i, i, "*")

sink()

> dir() [1] "temp.csv" "temp2.csv" “name.txt”

> write.table("name.txt", header=T, sep="\t")

>

Page 16: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Basic Scripting

• Adding rows and columns – Allows objects to be joined, either to an existing object or to make a new

object

– cbind() – adds columns together– rbind() – adds rows together

> y1 [,1] [,2] [,3][1,] 1 3 12.5[2,] 1 2 13.8[3,] 1 5 15.3[4,] 1 4 16.8

> y2 [,1][1,] 0.349[2,] 0.745[3,] 0.684[4,] 0.964

> y3 <- cbind(y1, y2)> y3 [,1] [,2] [,3] [,4][1,] 1 3 12.5 0.349[2,] 1 2 13.8 0.745[3,] 1 5 15.3 0.684[4,] 1 4 16.8 0.964

> y3 <- rbind(y1, y2[1:3])> y3 [,1] [,2] [,3][1,] 1.000 3.000 12.500[2,] 1.000 2.000 13.800[3,] 1.000 5.000 15.300[4,] 1.000 4.000 16.800[5,] 0.349 0.745 0.684

Page 17: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Basic Scripting

• for loops– loop through a set of commands a given number of times– very useful, but are not optimal for memory > dim(y)[1] 10 10

> for(i in 1:ncol(y)) { y_mean <- mean(y[i, 1:10]) }

> y_mean[1] 0.1974492

> out <- array(0, c(ncol(y), 1))

> for(i in 1:ncol(y)) { out[i] <- mean(y[i, ]) }

> out [,1] [1,] -0.3110800 [2,] -0.2000344 [3,] 0.2019573 [4,] 0.2859823 [5,] 0.1932523 [6,] 0.2759323 [7,] -0.2571102 [8,] -0.1037983 [9,] 0.3522018[10,] 0.1974492

Page 18: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Data Manipulation

• Check data– dim()– mydata[1:10, 1:10]– str()– summary()– head()– tail()– table()– etc…

> mydata <- read.table("mydata.txt", header=T, sep="\t")> dim(mydata)[1] 642 1470

> mydata[1:10, 1:10]

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 2 2 1 2 1 2 0 1 0 1 [2,] 0 0 2 2 0 0 1 2 1 2 [3,] 0 2 2 2 1 1 0 0 2 1 [4,] 2 0 2 2 2 0 1 2 0 1 [5,] 2 0 0 2 0 1 1 0 2 0 [6,] 2 1 2 1 1 0 2 2 1 1 [7,] 1 1 2 2 1 2 2 2 0 1 [8,] 0 1 0 0 0 1 1 1 1 1 [9,] 0 0 1 2 1 2 2 0 0 1[10,] 1 0 1 1 2 0 1 0 0 1

Page 19: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Data Manipulation

• Reordering– If you have a data.frame or matrix (numbers or letters)

– Use: order()– index <- order(old[,1], decreasing=T)

> dim(lamb)[1] 1600 5> head(lamb) Field Weight sire dam sex1 A 22.92368 1 1 F2 A 27.52896 1 1 F3 A 25.52592 1 1 M4 A 25.56016 1 1 M5 A 24.53296 1 2 F6 A 22.03344 1 2 F

> lamb <- lamb[order(lamb$sex, decreasing=F), ]

> head(lamb) Field Weight sire dam sex1 A 22.92368 1 1 F2 A 27.52896 1 1 F5 A 24.53296 1 2 F6 A 22.03344 1 2 F9 A 30.37944 2 1 F10 A 25.93680 2 1 F

Page 20: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Data Manipulation

• Reordering– order()> lamb <- lamb[order(lamb$sex, decreasing=F), ]

> rows <- order(lamb$sex, decreasing=F)> lamb <- lamb[rows, ]

> index <- order(lamb$sex, decreasing=F)

> head(index)

[1] 1 2 5 6 9 10

> lamb <- lamb[index, ]

Expanded way

Page 21: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Data Manipulation

• Replacing– index – which()

> class(lamb)[1] “matrix”> head(lamb) Field Weight sire dam sex1 A 22.92368 1 1 F2 A 27.52896 1 1 F3 B 25.52592 1 1 M

> index <- lamb[,1]==“A”> head(index)[1] TRUE TRUE FALSE TRUE FALSE

> lamb[index, 1] <- ”C”

> head(lamb) Field Weight sire dam sex1 C 22.92368 1 1 F2 C 27.52896 1 1 F3 B 25.52592 1 1 M

> index <- which(lamb[,1]=="A")> head(index)1 2 4 6 7 10

> lamb[index, 1] <- ”C”

> lamb[which(lamb[,1]==”A”, 1] <- ”C”

Put it together

Page 22: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Introduction to R: Joseph Powell

Data Manipulation

• Replacing> class(lamb)[1] “matrix”> head(lamb) Field Weight sire dam sex1 A 22.92368 1 1 F2 A 27.52896 1 1 F3 B 25.52592 1 1 M

> index <- lamb[,2] <= 22.000> table(index)indexFALSE TRUE 1553 47

> lamb[index, 2] <- ”NA”

> which(lamb[,2] >= 20.0 & lamb[,2] <= 21.0) 214 363 496 842 921 983 1103 1126

> which(lamb[,1]==“A” & lamb[,2] >= 20.0 & lamb[,2] <= 21.0) 214 363 496

> new_lamb <- lamb[which(lamb[,1]==“A” & lamb[,2] >= 20.0 & lamb[,2] <= 21.0) , ]

> new_lamb

Field Weight sire dam sex214 A 2046 27 2 F363 A 2008 46 1 M496 A 2041 62 2 M

Page 23: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

Graphics with R: Overview

1. Why graphics?

2. Why graphics in R?

3. The R graphics systems (did you really expect just one?)

4. Graphics basics and examples

5. Customisation of a graphic

6. Overview of different systems and packages

Introduction to R: Joseph Powell

Page 24: Introduction to R:Joseph Powell Overall Aims Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R

plot(x, y, …)

> ?Formaldehyde> head(Formaldehyde) carb optden1 0.1 0.0862 0.3 0.2693 0.5 0.4464 0.6 0.5385 0.7 0.6266 0.9 0.782> plot(Formaldehyde)> ?par

Introduction to R: Joseph Powell