58
Quantitative Data Analysis Working with R

Introduction to R programming

Embed Size (px)

DESCRIPTION

Quantitative Data Analysis - Part I: Introduction to R programming - Ma

Citation preview

Page 1: Introduction to R programming

Quantitative Data Analysis

Working with R

Page 2: Introduction to R programming

Working with RWhat is R

A computer language, with orientation toward statistical applications

AdvantagesCompletely free, just download from Internet

Many add-on packages for specialized uses

Open source

Page 3: Introduction to R programming

Getting Started: Installing RHave Internet connectionGo to http://cran.r-project/R for Windows screen, click “base”Find, click on download R Click Run, OK, or Next for all screensEnd up with R icon on desktop

Page 4: Introduction to R programming

At http://cran.r-project.org/

Haga clic para modificar el estilo de texto del patrónSegundo nivel

● Tercer nivel● Cuarto nivel

● Quinto nivel

Page 5: Introduction to R programming

Downloading Base R

Click on WindowsThen in next screen, click on “base”Then screens for Run, OK, or NextAnd finally “Finish”

will put R icon on desktop

Page 6: Introduction to R programming

Rgui and R Consolenending with R prompt (>)

Haga clic para modificar el estilo de texto del patrónSegundo nivel

● Tercer nivel● Cuarto nivel

● Quinto nivel

Page 7: Introduction to R programming

The R prompt (>)

> This is the “R prompt.” It says R is ready to take your command.Enter these after the prompt, observe output

>2+3

>2^3+(5)

>6/2+(8+5)

>2 ^ 3 + (5)

Page 8: Introduction to R programming

Installing Packages and Libraries

install.packages("akima")install.packages("chron")install.packages("lme4")install.packages("mcmc")install.packages("odesolve")install.packages("spdep")install.packages("spatstat")install.packages("tree")install.packages("lattice")

Page 9: Introduction to R programming

Installing Packages and Libraries

Page 10: Introduction to R programming

Installing Packages and Libraries

R.versioninstalled.packages()update.packages()setRepositories()

Page 11: Introduction to R programming

Help

help(mean) ?meanhelp will not find a function in a package unless you install it and load it with libraryhelp.search(“aspline”) will find functions in packages installed but not loadedapropos("lm")

Page 12: Introduction to R programming

Help

For help on whole package:help(package=akima)

objects(grep("akima",search()))

library(“akima”) my.packages <- search()aki <- grep("akima",my.packages)my.objects <- objects(aki)

Page 13: Introduction to R programming

Help

example(mean)

demo()demo(package = packages(all.available = TRUE))demo(graphics)

vignette(all=TRUE)V <- vignette("sp")print(V)edit(V)

Page 14: Introduction to R programming

Maintenance

ls() / objects()search()class(a)rm(a,b,c)rm(list=ls())

Page 15: Introduction to R programming

Maintenance

getwd()setwd()source("myprogram.R ")save(list = ls(all=TRUE), file= "all.Rdata")load("all.Rdata")save.image()savehistory()

Page 16: Introduction to R programming

To cite use of R

To cite the use of R for statistical work, R documentation recommends the following: R Development Core Team (2010). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.

Get the latest citation by typing citation ( ) at the prompt.

Page 17: Introduction to R programming

Email Support Lists

http://r-project.org under "mailing lists"r-help is the most general oneBefore posting, read: http://www.R-project.org/postingguide.htmlSend the smallest possible example of your problem (generated data is handy)sessionInfo() will list your computer & R details to cut/paste to your question

Page 18: Introduction to R programming

Quantitative Data Analysis

Programming with R

Page 19: Introduction to R programming

Basic concepts

CodeCommandsProgramsObjectsTypesFunctionsOperators

Page 20: Introduction to R programming

assignment

a <- 1assign("b", 2)

Page 21: Introduction to R programming

Mathematical operators

+ - */ ^ arithmetic> >= < <= == != relational! & logical$ list indexing (the ‘element name’ operator): create a sequence~ model formulae

Page 22: Introduction to R programming

Logical operators

! logical NOT& logical AND| logical OR< less than<= less than or equal to> greater than>= greater than or equal to== logical equals (double =)!= not equal&& AND with IF|| OR with IFxor(x,y) exclusive ORisTRUE(x) an abbreviation of identical(TRUE,x)all(x)any(x)

Page 23: Introduction to R programming

Mathematical functions

log(x) log to base e of xexp(x) antilog of x exlog(x,n) log to base n of xlog10(x) log to base 10 of xsqrt(x) square root of x

factorial(x) x!choose(n,x) binomial coefficients n!/(x! n−x!)gamma(x) x, for real x x−1!, for integer xlgamma(x) natural log of x

Page 24: Introduction to R programming

Mathematical functions

floor(x) greatest integer <xceiling(x) smallest integer >xtrunc(x) round(x, digits=0) round the value of x to an integerabs(x) the absolute value of x, ignoring the minus sign if there is onesignif(x, digits=6) give x to 6 digits in scientific notation

Page 25: Introduction to R programming

Trigonometrical functions

cos(x) cosine of x in radianssin(x) sine of x in radianstan(x) tangent of x in radiansacos(x), asin(x), atan(x) inverse trigonometric transformations of real or complex numbersacosh(x), asinh(x), atanh(x) inverse hyperbolic trigonometric transformations of real or complex numbers

Page 26: Introduction to R programming

Infinity and Things that Are Not a Number

Inf (is.finite,is.infinite)3/0

2 / Inf

exp(-Inf)

(0:3)^Inf

NaN (is.nan)0/0

Page 27: Introduction to R programming

Vectors

a <- c(1,2,3,4,5)a <- 1:5a <- scan()a <- seq(1,10,2)b <- 1:4a <- seq(1,10,along=b)x <- runif(10)which(a == 2)

Page 28: Introduction to R programming

Plotting functions

x<-seq(-10,10,0.1)y<-x^3plot(x,y,type=‘l’)

Page 29: Introduction to R programming

Vector functions

max(x) maximum value in xmin(x) minimum value in xsum(x) total of all the values in xsort(x) a sorted version of xrank(x) vector of the ranks of the values in xorder(x) an integer vector containing the permutation to sort x into ascending orderrange(x) vector of minx and maxx

Page 30: Introduction to R programming

More functions

cumsum(x) vector containing the sum of all of the elements up to that pointcumprod(x) vector containing the product of all of the elements up to that pointcummax(x) vector of non-decreasing numbers which are the cumulative maxima of the values in x up to that pointcummin(x) vector of non-increasing numbers which are the cumulative minima of the values in x up to that pointpmax(x,y,z) vector, of length equal to the longest of x y or z, containing the maximum of x y or z for the ith position in eachpmin(x,y,z) vector, of length equal to the longest of x y or z, containing the minimum of x y or z for the ith position in eachrowSums(x) row totals of dataframe or matrix xcolSums(x) column totals of dataframe or matrix x

Page 31: Introduction to R programming

functions

Geometric mean (p.49)

geometric<-function (x) exp(mean(log(x)))

Harmonic mean (p.51)

harmonic<-function (x) 1/mean(1/x)

Page 32: Introduction to R programming

Exercises

Finding the value in a vector that is closest to a specified valueclosest<-function(xv,sv){ xv[which(abs(xv-sv)==min(abs(xv-sv)))]}

Calculate a trimmed mean of x which ignores both the smallest and largest values

trimmed.mean <- function (x) { mean(x[-c(which(x==min(x)),which(x==max(x)))])}

Page 33: Introduction to R programming

Sets

union(x,y)intersect(x,y)setdiff(x,y)setequal(x,y),is.element(el,set)

Page 34: Introduction to R programming

Matrices

X<-matrix(c(1,0,0,0,1,0,0,0,1),nrow=3)dim(X)is.matrix(X)

vector<-c(1,2,3,4,4,3,2,1)V<-matrix(vector,byrow=T,nrow=2)dim(vector) <- c(2,4)

Page 35: Introduction to R programming

Matrices

X<-rbind(X,apply(X,2,mean))X<-cbind(X,apply(X,1,var))

Page 36: Introduction to R programming

sweep

matdata<-read.table("data\\sweepdata.txt")cols<-apply(matdata,2,mean)sweep(matdata,2,cols)

Page 37: Introduction to R programming

listsperson <- list()person$name <- "Alberto”person$age <- 37person$nationality <- "Spain“class(persona)[1] "list"

> persona$name[1] "Alberto"

$age[1] 37

$nationality[1] "Spain"

names(persona)[1] “name" “age" "nationality"

Page 38: Introduction to R programming

Stringsphrase<-"the quick brown fox jumps over the lazy dog"letras <- table(strsplit(phrase,split=character(0)))numwords<-1+table(strsplit(phrase,split=character(0)))[1]

words <- unlist(strsplit(phrase,split=" "))words[grep("o",words)]"fox" %in% unlist(strsplit(phrase,split=" "))unlist(strsplit(phrase,,split=" ")) %in% c("fox","dog")

Page 39: Introduction to R programming

Strings

nchar(words)paste(words[1],words[2])toupper(words)

Page 40: Introduction to R programming

Regular expressions

grep("^t", words)words[grep("^t", words)]words[grep("s$", words)]gsub("o","O",words)regexp()

Page 41: Introduction to R programming

Dataframes

lista <- data.frame() lista[1,1] = "Alberto"lista[1,2] = 37lista[2,1] = "Ana"lista[2,2] = 23names(lista) <- c("Ana", "Edad")

Page 42: Introduction to R programming

Missing values

NA (is.na)x<-c(1:8,NA)mean(x)mean(x,na.rm=T)which(is.na(x))as.vector(na.omit(x))x[!is.na(x)]

Page 43: Introduction to R programming

Dates and Times in R

date()date<- as.POSIXlt(Sys.time())unlist(unclass(date))difftime()excel.dates <- c("27/02/2004", "27/02/2005", "14/01/2003“,"28/06/2005", "01/01/1999")strptime(excel.dates,format="%d/%m/%Y")

Page 44: Introduction to R programming

Testing and Coercing in R

Page 45: Introduction to R programming

if

if (y > 0) print(1) else print (-1)z <- ifelse (y < 0, -1, 1)

Page 46: Introduction to R programming

Loops and Repeatsfor (i in 1:10) print(i^2)

t = 1

while(t<=10) {

print(i^2)

i <- i + 1

}

t = 1

repeat {

if (i > 10)break

print(i^2)

i <- i + 1

}

Page 47: Introduction to R programming

Exercise

Compute the Fibonacci series 1, 1, 2, 3, 5, 8

fibonacci<-function(n) {

a<-1

b<-0

while(n>0)

{swap<-a

a<-a+b

b<-swap

n<-n-1 }

b }

Page 48: Introduction to R programming

Avoid loops

x<-runif(10000000)

system.time(max(x))

pc<-proc.time()

cmax<-x[1]

for (i in 2:length(x)) {

if(x[i]>cmax) cmax<-x[i]

}

proc.time()-pc

Page 49: Introduction to R programming

switch

central<-function(y, measure) {switch(measure,

Mean = mean(y),

Geometric = exp(mean(log(y))),

Harmonic = 1/mean(1/y),

Median = median(y),

stop("Measure not included"))

}

Page 50: Introduction to R programming

Quantitative Data Analysis

Working with datasets

Page 51: Introduction to R programming

Help for DatasetsTo list built-in datasets:

data()data(package = .packages(all.available = TRUE))data(swiss)

For help on a dataset: help(swiss) “Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888.”

Page 52: Introduction to R programming

The attach Command

To access individual variables, do this:> attach(swiss)Now try:> mean(Fertility)> detach(swiss)

Page 53: Introduction to R programming

Using R Functions: Simple Stuff

rownames(swiss)colnames(swiss)• summary(swiss)

Applying functionsmean(swiss$Fertility)

sd(swiss$Fertility)

apply(swiss,2,max)

Page 54: Introduction to R programming

Factorsclass(Detergent)nlevels(Detergent)levels(Detergent)as.factor()

Page 55: Introduction to R programming

Working with your dataset

fix(swiss)hist(Agriculture)plot(Catholic,Fertility)

Page 56: Introduction to R programming

Working with your own datasets

write.table(swiss, "swiss.txt")swiss2 <- read.table("swiss.txt")

data<-read.table(file.choose(),header=T)

readLines()

Page 57: Introduction to R programming

Reading data from files

read.table(file) reads a file in table format and creates a data frame from it; the default separator sep="" is any whitespace; use header=TRUE to read the first line as a header of column names; use as.is=TRUE to prevent character vectors from being converted to factors; use comment.char="" to prevent "#" from being interpreted asa comment; use skip=n to skip n lines before reading data; see thehelp for options on row naming, NA treatment, and othersread.csv("filename", header=TRUE) id. but with defaults set for reading comma-delimited filesread.delim("filename", header=TRUE) id. but with defaults setfor reading tab-delimited filesread.fwf(file,widths)read a table of f ixed width f ormatted data into a ’data.frame’; widthsis an integer vector, giving the widths of the fixed-width fields

Page 58: Introduction to R programming

Example

data<-read.table(".\\data\\daphnia.txt",header=T)names(data)attach(data)table(Detergent)tapply(Growth.rate,Detergent,mean)aggregate(Growth.rate,list(Detergent), mean)tapply(Growth.rate,list(Water,Daphnia),median)with(data,boxplot(Growth.rate ~ Detergent))