Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Introduction Data types and structures Control structures Data Handling Basic Analysis
Introduction to RUrfist
Moritz Muller, Maıtre de Conference at FSEG, Universit’e deStrasbourg
05.-06.12.2019
1 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Introduction
Data types and structures
Control structures
Data Handling
Basic Analysis
2 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
OverviewIntroduction
What is R?First stepsExamplesData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course
3 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
References
Shorted list
• Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statisticswith S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0(downloadable)
• Uwe Ligges (2009): Programmieren mit R, 3.,Uberarb. u.erweiterte Auflage, Springer Verlag, Heidelberg.
• Venables, W.N., Smith, D.M., and the R core team (2015) AnIntroduction to R. Notes on R: A Programming Environment forData Analysis and Graphics. Version 3.2.0 (2015-04-16). Downloadhttp://cran.r-project.org/doc/manuals/R-intro.pdf
• Daalgard, P (2008) Introductory Statistics with R. Second Edition.Springer Science+Business Media LLC, New York. Downloadhttp://www.academia.dk/BiologiskAntropologi/Epidemiologi/PDF/Introductory Statistics with R 2nd ed.pdf.
4 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Why (not) use R?
5 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
What is R?
What is R?Object oriented, interpreted, language and programmingenvironment for Data analysis and graphics.
• huge collection of tools for statistics and data analysis
• a language for expressing statistical models and tools
• graphical facilities
• effective object-oriented programming language that can beextended
6 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Short History of R
Short history
1976 S language developed by Bell Labs at AT&T
1988 S-PLUS, commercial version of S (Statistical Sciences Inc.)
1995 R, open source (GPL) version of S (by Ross Ihaka and RobertGentleman)
1997 R Development Core Team (in short: R Core Team)
1998 Comprehensive R Archive Network (CRAN) founded
2000 R-1.0.0 first version compatible with S
2001 R News journal appears
2004 R version R-2.0.0 (S4 methods in package methods)
2015 R consortium (industry support)
7 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Short History of R - submitted packages over time1
1Gergely Daroczi, source: https://gist.github.com/daroczig8 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Why (not) use R?
Interpreted language
• programming on the fly
• flexible handling
• slower than a compiled language such as C, C++ (manybuilt-in functions are in C)
9 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Why (not) use R?
Open Source
• No black box (in principle)
• New methods early available
• MCMC with Stan (https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started),
• Webscraping with rSelenium(https://www.seleniumhq.org/projects/webdriver/),
• Deep learning with Keras(https://www.tensorflow.org/guide/keras),
• Natural language processing with WordNet(https://wordnet.princeton.edu/), and many more.
• Support from the community
10 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Why (not) use R?
Open Source & Programming needed
• No central authority
• No GUI (Graphical User Interface)
• e.g. offered by SPSS, SAS
Do you want to travel all-inclusive or as back-packer?
11 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Why (not) use R?
Open Source & Programming needed
• No central authority
• No GUI (Graphical User Interface)
• e.g. offered by SPSS, SAS
Do you want to travel all-inclusive or as back-packer?
11 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Today’s topicsIntroduction
What is R?First stepsExamples
Data types and structuresData typesVectorMatrixListsData.frames
Control structuresConditional expressionsLoops
Data HandlingRead/WriteData cleaning/formattingData Merge/Selection
Basic AnalysisPlottingStatisticsBeyond this course
12 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
OverviewIntroduction
What is R?First stepsExamplesData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course
13 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Installation on your own device:
Install R
• go to http://www.r-project.org/
• Download R for Windows / Mac / Linux
• Install ‘Base distribution’ with default settings
Install RStudio
• go to http://www.rstudio.org/
• Download and install with default
14 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Start R Studio
Explain R vs. ‘R Studio’. Small tour through RStudio.
15 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
R - Calculator
Math expressions
> 1 + 2 * 3
[1] 7
> 2 * 5^2 - 10 * 5 # a comment
[1] 0
> 4 * sin(pi / 2)
[1] 4
> 0 / 0 # not defined (Not a Number)
[1] NaN
16 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Arithmetic Operators, Functions, Values2
Operators, Functions, Values Description
∗, / multiplication, division+, - addition, subtraction%% modulo
max(), min() extremaabs() absolute valueround(), floor(), ceiling() round (up, down)sum(), prod() sum, productlog() logarithmsin(), cos() sinus, cosinus
pi the value of piInf, -Inf infinityNaN not defined (Not a Number)NA Not AvailableNULL empty set
2See R Development Core Team (2008): R: A Language and Environmentfor Statistical Computing. R Foundation For Statistical Computing, Vienna,Austria. URL http://www.R-project.org.
17 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Assignments
How to ‘remember’ calculations
> x1 <- 3.25 # assign the value 3.25 to object x1
> x1
[1] 3.25
# please don’t use ‘=’ or ‘->’ for assignments
# <<- ‘special’ assignment - kept for later
Declaration and initialisation at once.
18 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Assignments
How to ‘remember’ calculations
> x1 <- 3.25 # assign the value 3.25 to object x1
> x1
[1] 3.25
# please don’t use ‘=’ or ‘->’ for assignments
# <<- ‘special’ assignment - kept for later
Declaration and initialisation at once.
18 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Print function - print()
What appears on screen? Try
> x1 <- 3.25
> (x1 <- 3.25)
> print(x1 <- 3.25)
> x1
19 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Objects
R is an object oriented language
• Everything in R is an object (data, functions, etc.),
• objects have names (start with letter, upper-/lower-case),
• every object has a length(),
• every object is of a certain class - class(),
• for each class, there are special (generic) functions (e.g.print()),
• every object has a mode (data type) - mode(),
• an object may have attributes - attributes(),
• classes may inherit from other classes.
20 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Find Help
Integrated help
• help.start()
• help(”cor”), ?cor
• help.search(”correlation”), ??correlation
21 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Find Help
Internet
• check r-cran
• cran taskview
• cran discussions/mailing lists/examples
• manuals / vignettes
• R news/journal
• search for snippets - anything
22 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Find Help
Books
• see literature first slide and many more.
23 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
OverviewIntroduction
What is R?First stepsExamplesData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course
24 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Faithful - example
25 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Workspace and working directory
Workspace - save what?
• workspace keeps your R-objects
• workspace is located in your RAM
• different environments in your workspace
26 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Workspace and working directory
Workspace - try out
> ls()
?
> search()
?
> ls(name="package:datasets")
?
27 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Workspace and working directory
Working directory - default folder for save/load etc. - try out
> getwd()
[1] "/Users/moritz"
> setwd("/Users/moritz/Documents/Teaching
/2019_2020/Urfist")
28 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
OverviewWhat is R?First stepsExamples
Data types and structuresData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course
29 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Data types - mode()
Description Example Data type
Basic data types (hierarchical)
empty set NULL NULLboolean TRUE logical
integers & reals 3.14 numericcomplex numbers 2.13+1i complex
characters and strings ”Hello” character
Composite data type
factors blue & red factor
30 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Logical values in R
Try and explain
> 4 < 3
> (3+1) !=3
> -3<-2
> -3 < -2
> (3 >= 2) & (4 == (3+1))
> c(TRUE, FALSE) & c(TRUE, TRUE)
> x <- c(-4, -5, -1, 0, 2, 4, 5)
> x > 0
> sum(x > 0)
31 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Logical values in R
Function, Operator, Value Description
==, != equal, not equal>, >= greater, greater equal<, <= smaller, smaller equal! not
&, && and (vector, non-vector)|, || or (vector, non-vector)xor() exclusive or
TRUE, FALSE true, false
32 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Logical values in R
Try - any(), all(), which()
> x <- c(2, 0, 3, 1, 6, 9, 7)
> someTrue <- x > 4
> allTrue <- x >= 0
> any(someTrue)
> all(allTrue)
> any(allTrue)
> all(someTrue)
> !any(allTrue)
> all(!someTrue)
> all(!allTrue)
> which(someTrue)
> which(!someTrue) 33 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Logical values in R
Not available - NA - conversion into other data types
> x <- NA
> mode(x)
>
> y <- c(3, x)
> mode(y)
34 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Logical values in R
Not available - NA - is special
> # convert NA
> x <- NA
> mode(x)
> y <- c(3, x)
> mode(y)
> # comparison and calculation
> # - (if a part is unknown - result will be unknown)
> y > 2
> sum(y)
> # do not consider (remove) NA elements
> sum(y, na.rm=TRUE)
35 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Numeric values in R
Try - mode(), typeof()
> (x <- pi)
[1] 3.141593
> mode(x)
[1] "numeric"
> typeof(x)
[1] "double"
> (y <- as.integer(x)) # information loss
[1] 3
> typeof(y)
[1] "integer"
36 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Numeric values in R
Try - conversion
> is.character(y)
[1] FALSE
> x <- -1
> -1^0.5
[1] -1
> sqrt(as.complex(x))
[1] 0+1i
> (z <- as.character(y))
[1] "3"
> as.numeric("z")
[1] NA
Warning message:
NAs introduced by coercion
> (y <- as.numeric("-1"))
[1] -1
> y*5
[1] -5
37 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Character values in R
There are just strings. Manipulating them is another story. Nextcourse.
"hello world"
a <- "hello world"
38 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Factors in R
(ordered) factors
gulliver <- factor(c("dwarf","giant", "giant",
"giant","dwarf"))
gulliver <- factor(c("dwarf","giant", "giant",
"giant","dwarf"),
levels=c("giant","dwarf","gulliver"))
gulliver <- ordered(c("dwarf","giant", "giant",
"giant","dwarf"),
levels=c("dwarf","gulliver","giant"))
39 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Factors in R
(ordered) factors
> class(gulliver)
[1] "ordered" "factor"
> str(gulliver)
Ord.factor w/ 3 levels "dwarf"<"gulliver"<..: 1 3 3 3 1
> mode(gulliver)
[1] "numeric"
> length(gulliver)
[1] 5
40 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
OverviewWhat is R?First stepsExamples
Data types and structuresData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course
41 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Data structures
Data structure is the representation of data, which defines how weoperate on the data.
Data structures in R
• vector
• matrix
• array
• list
• data.frame
For each data structure you have a function to declare & initialisean object, e.g. vector() .
42 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Vector
characteristics of vectors
> x <- c(4.1, 4.3, 2, 54, 34) # c() for concatenate
>
> class(x)
[1] "numeric"
> mode(x)
[1] "numeric"
> typeof(x)
[1] "double"
> str(x)
num [1:5] 4.1 4.3 2 54 34
> is.vector(x)
[1] TRUE
> length(x)
[1] 5
> attributes(x)
NULL43 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Vector
characteristics of vectors
> # Only of one data type
> c(x, FALSE, TRUE)
[1] 4.1 4.3 2.0 54.0 34.0 0.0 1.0
> c(x, FALSE, TRUE, "Hello")
[1] "4.1" "4.3" "2" "54" "34" "FALSE" "TRUE" "Hello"
44 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Vector
characteristics of vectors
> x <- c(height=10, width=4)
> x
height width
10 4
> attributes(x)
$names
[1] "height" "width"
> names(x)
[1] "height" "width"
45 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Vector
sequences
> 2:5
[1] 2 3 4 5
> seq(2,5,by=2)
[1] 2 4
> rep(2:4,times=3)
[1] 2 3 4 2 3 4 2 3 4
46 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Vector
Calculate with vectors
> x <- c(2, 4, 8)
> x*2
[1] 4 8 16
> x-3
[1] -1 1 5
>
> x * c(3,2,1)
[1] 6 8 8
> x * c(3,2)
[1] 6 8 24
Warning message:
In x * c(3, 2) :
longer object length is not a multiple of shorter object length
47 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Vector
Calculate with vectors
> t(2:4) %*% 1:3
[,1]
[1,] 20
> 2:4 %*% 1:3
[,1]
[1,] 20
> 2:4 %*% t(1:3)
[,1] [,2] [,3]
[1,] 2 4 6
[2,] 3 6 9
[3,] 4 8 12
48 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Vector
Indexing vectors
> x
[1] 2 4 8
> x[2] # 2nd element by position
[1] 4
> x[c(FALSE,TRUE,FALSE)] # 2nd element with logical value
[1] 4
> names(x) <- c("Height","Length","Width")
> x["Length"] # access by name
Length
4
> x[-2] # without 2nd element
[1] 2 8
49 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Vector
Indexing vectors
# vector x
x <- c(2, 4, 8)
# extract by boolean vector
idx <- x < 5
print(idx)
x[idx]
# overwrite
x[idx] <- 2
# extract by position vector
pos <- which(x < 5)
print(pos)
x[pos]
# overwrite
x[pos] <- 150 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Vector
Indexing vectors
> x[3,2,1] # reverse order?
Error in x[3, 2, 1] : incorrect number of dimensions
> x[c(3,2,1)] # reverse order!
[1] 8 4 2
> x[] <- -2 # replace every element by -2
> x
[1] -2 -2 -2
> x <- -2 # overwrite x by -2
> x
[1] -2
51 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
OverviewWhat is R?First stepsExamples
Data types and structuresData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course
52 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Matrix
initialise a Matrix - try and explain
x <- 1:3
y <- 4:6
cbind(x,y)
rbind(x,y)
matrix(1:6, nrow=3, ncol=2)
matrix(1:6, nrow=3, byrow=TRUE)
matrix(1:6, ncol=2, byrow=TRUE)
matrix(1:6, nrow=3, ncol=4, byrow=TRUE)
53 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Matrix
Operate on matrices - try and explain
A <- matrix(c(1,1,2,2), nrow=2, byrow=TRUE)
B <- t(A) # transpose
solve(A) # inverse
# elementwise operations
A * B
A / B
A - B
A^B
A == B
A == t(B)
any(A==B)
# matrix multiplication
A %*% B54 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Matrix
Access matrix elements - exercise
>X <- matrix(1:6, nrow=3)
>rownames(X) <- c("length", "width", "height")
>colnames(X) <- c("shelf","chair")
• read length of shelf by [row pos., column pos.]
• read length of shelf by [row name, column name]
• read length of shelf by [row name, column pos.]
• overwrite length of shelf with 14
• get all dimensions of shelf
• get length of all mobiliar
• overwrite length of mobiliar by prior length times 10055 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Matrix
Matrix features
> class(X)
[1] "matrix"
> typeof(X)
[1] "integer"
> mode(X)
[1] "numeric"
56 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Matrix
Matrix features
> attributes(X)
$dim
[1] 3 2
$dimnames
$dimnames[[1]]
[1] "length" "width" "height"
$dimnames[[2]]
[1] "table" "chair"
57 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Matrix
Matrix features
> str(X)
int [1:3, 1:2] 1 2 3 4 5 6
- attr(*, "dimnames")=List of 2
..$ : chr [1:3] "length" "width" "height"
..$ : chr [1:2] "table" "chair"
Matrices are special cases of vectors in R (the opposite in math ofcourse).
58 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Array
Array - like matrix but possibly more dimensions
Avoid arrays.Don’t use arrays with more than 3 dimensions if not absolutelynecessary.Use map of array.
59 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
OverviewWhat is R?First stepsExamples
Data types and structuresData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course
60 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
List
List - features
• very flexible,
• may contain different data types,
• also lists (recursive)
61 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
List
List - features
> (L1 <- list("this is a", 100, matrix(1:4,nrow=2),
list(TRUE,FALSE)))
[[1]]
[1] "this is a"
[[2]]
[1] 100
[[3]]
[,1] [,2]
[1,] 1 3
[2,] 2 4
[[4]]
[[4]][[1]]
[1] TRUE
[[4]][[2]]
[1] FALSE
62 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
List
List - features
> (L1 <- list(a="this is a", b=100, c=matrix(1:4,nrow=2),
truth=list(TRUE,FALSE)))
$a
[1] "this is a"
$b
[1] 100
$c
[,1] [,2]
[1,] 1 3
[2,] 2 4
$truth
$truth[[1]]
[1] TRUE
$truth[[2]]
[1] FALSE
63 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
List
List - features
> L1[[1]] # access by position
[1] "this is a"
> L1[["a"]] # access by name
[1] "this is a"
>
> L1$b # access by name (but different)
[1] 100
64 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
List
List - features
> L1[[1]] <- NULL
> L1
$b
[1] 100
$c
[,1] [,2]
[1,] 1 3
[2,] 2 4
$truth
$truth[[1]]
[1] TRUE
$truth[[2]]
[1] FALSE65 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
OverviewWhat is R?First stepsExamples
Data types and structuresData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course
66 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
data.frames
data.frames
• the most typical structure for data sets
• data.frames are lists - but entries are vectors with same length
• data.frame()
67 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Data.frame
initialise Data.frame
> shopping <- data.frame(Product=c("cheese","wine","bread"),
Unit=c("grams","bottles","loaf"), Amount=c(300,2,2))
>
> str(shopping)
’data.frame’: 3 obs. of 3 variables:
$ Product: Factor w/ 3 levels "bread","cheese",..: 2 3 1
$ Unit : Factor w/ 3 levels "bottles","grams",..: 2 1 3
$ Amount : num 300 2 2
68 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Data.frame
Data.frame - access elements
> shopping[2,]
Product Unit Amount
2 wine bottles 2
> shopping[2,3]
[1] 2
> shopping[2,"Amount"]
[1] 2
> shopping$Amount[2]
[1] 2
69 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Data.frame
Plot two regimes of old faithful
40 50 60 70 80 90 100
12
34
56
waiting
eruptions
70 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Data.frame
Plot two regimes of old faithful (by subsetting data.frame)
# plot geysir data
plot(NA, xlim=c(40,100),ylim=c(1,6), xlab="waiting",
ylab="eruptions")
points(regime1[,"waiting"],regime1[,"eruptions"],col="red")
points(regime2[,"waiting"],regime2[,"eruptions"],col="blue")
points(rest[,"waiting"],rest[,"eruptions"],col="grey")
lines(x=c(70,70),y=c(0,7),lty=3)
lines(x=c(40,100),y=c(3.5,3.5),lty=3)
71 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Data.frame
Plot two regimes of old faithful (by adding variables todata.frame)
# plot geysir data
plot(faithful$waiting,faithful$eruptions, col=faithful$color,
xlim=c(40,100),ylim=c(1,6),
xlab="waiting", ylab="eruptions")
lines(x=c(70,70),y=c(0,7),lty=3)
lines(x=c(40,100),y=c(3.5,3.5),lty=3)
72 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
OverviewWhat is R?First stepsExamplesData typesVectorMatrixListsData.frames
Control structuresConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course
73 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Conditional expressions
If...else
> x <- 1
> if(x > 0){
+ print("x is pos.")
+ }else{
+ print("x is neg.")
+ }
[1] "x is pos."
74 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Conditional expressions
Ifelse
# vector-oriented, use for simple expr.
> x <- c(1,2,3,4)
> ifelse(x > 2, "A", "B")
[1] "B" "B" "A" "A"
75 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Conditional expressions
switch
> switch(2, a=11, b=12, cc=13, d=14)
[1] 12
> (switch("c", a=11, b=12, cc=13, d=14))
NULL
> switch("cc", a=11, b=12, cc=13, d=14)
[1] 13
76 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
OverviewWhat is R?First stepsExamplesData typesVectorMatrixListsData.frames
Control structuresConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course
77 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Loops
for()
> for(i in 1:3){
+ print(i)
+ }
[1] 1
[1] 2
[1] 3
78 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Loops
repeat
> i <- 0
> repeat{
+ i <- i+1
+ if(i < 3){
+ next # start next turn
+ }
+ print(i)
+
+ if(i == 3){
+ break # exit loop
+ }
+
+ }
[1] 379 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Loops
while
> i <- 0
> while(i < 3){
+ print(i)
+ i <- i + 1
+ }
[1] 0
[1] 1
[1] 2
80 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Loops in R
Short note on loops and efficiency
• R is executed by an interpreter (not compiled). Interpreterneeds to translate each loop into machine code.
• Vectorized code is typically faster (once translated, fetch,calculate write of each value ‘compressed’).
• maybe slower (if the size of the vector becomes too large),
• maybe difficult to read (if several objects are handled),
• maybe not possible (if sequential combination).
81 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Excercise
• Exercise ‘R shaked’
• Exercise ‘Normal Distribution uncommented’
82 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
OverviewWhat is R?First stepsExamplesData typesVectorMatrixListsData.framesConditional expressionsLoops
Data HandlingRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course
83 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Read/Write
Import/export, read/write
• R objects (load(), save())
• text-files (read.table() / write.table(), scan() / cat())
• operate on excel/access/data bases (e.g. RODBC, orRMySQL)
• SAS/SPSS/Stata (e.g. foreign)
See “R data import / export”.
84 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
OverviewWhat is R?First stepsExamplesData typesVectorMatrixListsData.framesConditional expressionsLoops
Data HandlingRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course
85 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Character string handling
Function Description
scan() / cat() Read from / print to file (or console)parse() / deparse() convert char into expression (and vice-versa)grep() regular expressionsnchar() number of chars in a stringstrsplit() / paste() (dis-)connect strings
See e.g. Sanchez, G. (2013) Handling and Processing Strings in RTrowchez Editions. Berkeley, 2013.http://www.gastonsanchez.com/Handling and Processing Stringsin R.pdf
86 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
OverviewWhat is R?First stepsExamplesData typesVectorMatrixListsData.framesConditional expressionsLoops
Data HandlingRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course
87 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Data Merge/Selection
How to join and split data.frames?
• merge(),
• rbind()
• cbind()
• subset()
• idx < − x=y; df[idx,c(”A”,”B”)]
Note: Merge/split of very large tables (millions of entries) better ina data base.
88 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Data Merge/Selection
How to join and split data.frames?
• merge(),
• rbind()
• cbind()
• subset()
• idx < − x=y; df[idx,c(”A”,”B”)]
Note: Merge/split of very large tables (millions of entries) better ina data base.
88 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
OverviewWhat is R?First stepsExamplesData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/Selection
Basic AnalysisPlottingStatisticsBeyond this course
89 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
R plotting
How to make good graphs
1. Excel plots
2. R plots - various devices - pdf, ps etc.
3. TikZ plots (vector graphics in LaTeX) - TikzDevice - NEW
4. Plotly - interactive web-based graphs via the open sourceJavaScript graphing library NEWER
90 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
R plotting
Devices
• Device defines where to plot, similar to paper, pergament etc.
• the same graph gets different ‘look and feels’ on differentdevices
• (transparent) colours differ by device
91 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
R plotting - pdf-device
●
●
●
●●
● ● ●
●
●
●●
●●
●●●● ●●● ●
●● ●●●
2000 2003 2006 2009
1020
30Regional spread of Greek GDP
Year
k E
uro
per
capi
ta
92 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Scientific plotting - tikzDevice
2000 2002 2004 2006 2008 2010
1015
2025
30
Regional spread of Greek GDP
Year
kE
uro
per
cap
ita
93 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
R plotting
Typical processStep 0: try out on the default device.
f <- paste(getwd(),"/GreekGDP.pdf",sep="") # define file
pdf(file=f, width=4, height=3) # open device
# plot into device
boxplot(gdp2[idx,"EUR_HAB"]/1000~gdp2[idx,"TIME"],notch=TRUE, xlab="Year", ylab="k Euro per capita",
main="Regional spread of Greek GDP")
dev.off() # close device
94 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
R plotting
Plot functions
• plot() - context dependent / generic function - depend on theclass to plot.
• add - points(), lines(), text() in the plot.
• barplot(), boxplot(), contour(), hist(), etc.
• plot functions from special packages
95 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
R plotting
• options in the function, e.g. cex
• options through par()
96 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
R plotting
Arguments frequently used in plot functions and par()Argument Description
axes (not) draw axes - later fine tunedcex size (multiply) of nodes and letterslty, lwd line type, line widthpch point symbolxlab, ylab axes labelsxlim, ylim axes dimensions
97 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
R plotting - regions
par()-ameters oma/mar3
3http://rgraphics.limnology.wisc.edu/rmargins sf.php98 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
R plotting - regions
par()-ameters mfcol/mfrow4
4http://rgraphics.limnology.wisc.edu/rmargins mfcol.php99 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
R plotting
General remarks
• Everything can be done,
• But may cost time to figure out how, and to code.
100 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
OverviewWhat is R?First stepsExamplesData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/Selection
Basic AnalysisPlottingStatisticsBeyond this course
101 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Descriptives
Checkoutsummary(), table(), prop.table(), cor(), rcorr(), etc.
102 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Descriptives - free work
What’s the chance of a happy end? Use the ‘Titanic’ data inthe library ‘datasets’.
103 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Descriptives - free work
Bring your project to the class room
YOUR PROJECT HERE.
104 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
OverviewWhat is R?First stepsExamplesData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/Selection
Basic AnalysisPlottingStatisticsBeyond this course
105 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Beyond this course
106 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Inductive stats
Checkoutlm(), glm(), task view
107 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Non-parametrics
Checkoutdensity() in R base, np-package, task view
108 / 109
Introduction Data types and structures Control structures Data Handling Basic Analysis
Bayesian
CheckoutrStan package, task view
109 / 109