Upload
agnes-bishop
View
224
Download
0
Tags:
Embed Size (px)
Citation preview
Introduction to
Damaris ZurellDynamic MacroecologySwiss Federal Research Institute [email protected]
http://www.r-project.org/
R is a tool …
Data manipulation
y ~ x Data modelling
Data visualisation
R is a tool …
Data manipulation
y ~ x Data modelling
Data visualisation
• Integrating different data sources
• Aggregating data, disintegrate, transform ...
• Statistical modelling• Numeric simulations
• Visualising models• Make your own graphics
R is an environment
The R environment: „more than an incremental accretion of very specific and inflexible tools“
„fully planned and coherent system“
R history
• First, there was S – developed in 1976 byJohn Chambers in Bell Laboratories at AT&T, as programming language for statistics, stochastic simulation, and graphical display
• 1988, commercial implementation in S-PLUS (Insightful Corp.)
• 1992, Ross Ihaka and Robert Gentleman start free implementation R under the GNU General Public License, mainly for teaching purposes
R history
• 1997, founding of R Development Core Team (abbrev: R Core Team) with today 20 persons from science and economy
• 1998, founding of Comprehensive R Archive Network (CRAN) – today >4000 additional packages
• 2000, first version completely compatible with S : R-1.0.0
R pros R cons• Open Source, on many operating
systems• „at the pulse of science“ – new
methods by scientists/developers implemented in R and available as packages
• Publication ready graphics• Excellent for simulations,
programming, computer intensive analyses, automating
• Best option for statistical computing
• Active user community: help by R Core Team, R-Help mailing list, fast bug-fixing
R pros R cons• Open Source, on many operating
systems• „at the pulse of science“ – new
methods by scientists/developers implemented in R and available as packages
• Publication ready graphics• Excellent for simulations,
programming, computer intensive analyses, automating
• Best option for statistical computing
• Active user community: help by R Core Team, R-Help mailing list, fast bug-fixing
• no fancy graphical user interface, bulky – steep learning curve for newbies, high beginner‘s frustration
• Easy to make mistakes• Computation of big data sets is
limited by RAM• „Many ways lead to Rome“
R is …
• An interpreted programming language – Commands are executed immediately
• Data types: empty values, numerical, logical, character• Data structures/object types: scalar, vector, matrix, array,data
frame, list• During one session, all objects are stored in your workspace• built-in and self-defined functions
R is plain
R is plain
Command line language: This is the prompt:>All commands follow after the prompt
R is a great calculator• Simple algebra
> 2+24
• Assign your results to a variable> X <- 2+2 # assignment operator „<-“> x^216
• Vector based calculations> mass<- c(10,13,6) # 3 Massen> acceleration <- c(2.2,1.7,3.1)> (force <- mass * acceleration )22.0 22.1 18.6
R is a great calculator• Simple statistics
> (x <- sample (1:20,10))4 15 12 14 18 3 9 20 19 16> mean(x)13> sd(x)5.981453
• Set operationsunion intersect setdiff
• Advanced statisticspbinom(40,100,0.5) # coin toss: is the coin unbiased?0.02844397(pshare <- pbirthday(18,366,coincident=2))0.3461382
R is a numeric simulator
• Built-in functions for common probability distributions
• e.g. simulate 10 000 pseudo-random numbers from 100 coin tosses– How often do you get heads?
> heads<-rbinom(10000,100,0.5)> hist(heads)
R Probability distributions
functions: d (density) probability density functionp (probability) cummulative distribution functionQ calculate quantilesR draw random numbers
Examples:Normal dnorm pnorm qnorm rnorm
Binomial dbinom pbinom …
Poisson dpois ..
R Probability distributions? distributions
Function Distribution
_beta() Beta
_binom() Binomial
_cauchy() Cauchy
_chisqu() χ2
_exp() Exponential
_f() F
_gamma() Gamma
_geom() Geometric
_hyper() Hypergeometric
_logis() Logistic
_lnorm() Lognormal
_multinom() Multinomial
_nbinom() Negative binomial
_norm() Normal
_pois() Poisson
_signrank() Wilcox signed rank statistic (One sample case)
_t() T
_unif() Uniform
_weibull() Weibull
_wilcox() Wilcox signed rank statistic (Two sample case)
R accepts all kinds of data sources
• Files (text, binary, data sets from other statistic programs)> Example <- read.csv(“example.csv",header=T)> example2 <- read.table(“example2.txt",header=T)
Cclipboard > cohesion<-read.table(file="clipboard",sep="\t",header=T)
• Database > library(RODBC)
> mdbConnect<-odbcConnectAccess("GPDDdist")> sqlTables(mdbConnect)
• Web > con <- url('http:/anywebsite.com/test.txt')> example3 <- read.table(con, header=T)
• R Objects (binary)> load(“example.RData")
R writes to all kinds of data sources
• to files> write.csv(example,“example.csv")> write.table(example,“example2.txt",row.names=F)
• to the clipboard> write.table(CORMAT,file="clipboard",sep="\t",col.names=NA)
• to data bases> channel <- odbcConnect("test")> sqlSave(channel, USArrests, rownames = "state", addPK=TRUE)> close(channel)
to R Objects> save(example3,“example.RData")
R visualising
• Many graphic functions are generic – they respond „intelligently“ to different object types> plot(iris)> plot(Petal.Length,Petal.Width, pch=as.numeric(Species))
R visualising
• Many graphic functions are generic – they respond „intelligently“ to different object types> boxplot(iris)> boxplot(Petal.Length~Species,data=iris,ylab="Petal.Length")
R visualising
• R Graph Gallery: http://gallery.r-enthusiasts.com/thumbs.php
R statistical modelling
• Linear model> fm <- lm(y ~ x, data=dummy)> summary(fm)Call:lm(formula = y ~ x, data = dummy)
Residuals: Min 1Q Median 3Q Max -4.3400 -1.7353 -0.2107 1.4644 4.8445
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.9150 1.2155 1.575 0.133 x 0.8581 0.1015 8.457 1.1e-07 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.617 on 18 degrees of freedomMultiple R-squared: 0.7989, Adjusted R-squared: 0.7877 F-statistic: 71.52 on 1 and 18 DF, p-value: 1.102e-07
R statistical modelling
And much more ...
Dormann & Kühn (2009): Angewandte Statistik für die biologischen Wissenschaften.
R geostatistical analyses
• variograms, Kriging etc.
www.mathworks.de
Hengl 2009
R as programming language> hi.there <- function() {+ cat("Hello World!\n")+ }> hi.there()Hello World!
R as programming language> hi.there <- function() {+ cat("Hello World!\n")+ }> hi.there()Hello World!
• Built your own function to keep your code tidy
• Built „new“ functions (and write packages)
• Dynamic models …
R extensions
• Integrate other source codes
• Batch processing
• Call from terminal
R GUIs
http://www.rcommander.com/
http://rstudio.org/
R Literature
• http://www.r-project.org/– Manuals– „Contributed
Documentation“