Introduction to R - Urfist · Introduction Data types and structures Control structures Data...

Preview:

Citation preview

Introduction Data types and structures Control structures Data Handling Basic Analysis

Introduction to RUrfist

Moritz Muller, Maıtre de Conference at FSEG, Universit’e deStrasbourg

05.-06.12.2019

1 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Introduction

Data types and structures

Control structures

Data Handling

Basic Analysis

2 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

OverviewIntroduction

What is R?First stepsExamplesData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course

3 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

References

Shorted list

• Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statisticswith S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0(downloadable)

• Uwe Ligges (2009): Programmieren mit R, 3.,Uberarb. u.erweiterte Auflage, Springer Verlag, Heidelberg.

• Venables, W.N., Smith, D.M., and the R core team (2015) AnIntroduction to R. Notes on R: A Programming Environment forData Analysis and Graphics. Version 3.2.0 (2015-04-16). Downloadhttp://cran.r-project.org/doc/manuals/R-intro.pdf

• Daalgard, P (2008) Introductory Statistics with R. Second Edition.Springer Science+Business Media LLC, New York. Downloadhttp://www.academia.dk/BiologiskAntropologi/Epidemiologi/PDF/Introductory Statistics with R 2nd ed.pdf.

4 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Why (not) use R?

5 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

What is R?

What is R?Object oriented, interpreted, language and programmingenvironment for Data analysis and graphics.

• huge collection of tools for statistics and data analysis

• a language for expressing statistical models and tools

• graphical facilities

• effective object-oriented programming language that can beextended

6 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Short History of R

Short history

1976 S language developed by Bell Labs at AT&T

1988 S-PLUS, commercial version of S (Statistical Sciences Inc.)

1995 R, open source (GPL) version of S (by Ross Ihaka and RobertGentleman)

1997 R Development Core Team (in short: R Core Team)

1998 Comprehensive R Archive Network (CRAN) founded

2000 R-1.0.0 first version compatible with S

2001 R News journal appears

2004 R version R-2.0.0 (S4 methods in package methods)

2015 R consortium (industry support)

7 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Short History of R - submitted packages over time1

1Gergely Daroczi, source: https://gist.github.com/daroczig8 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Why (not) use R?

Interpreted language

• programming on the fly

• flexible handling

• slower than a compiled language such as C, C++ (manybuilt-in functions are in C)

9 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Why (not) use R?

Open Source

• No black box (in principle)

• New methods early available

• MCMC with Stan (https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started),

• Webscraping with rSelenium(https://www.seleniumhq.org/projects/webdriver/),

• Deep learning with Keras(https://www.tensorflow.org/guide/keras),

• Natural language processing with WordNet(https://wordnet.princeton.edu/), and many more.

• Support from the community

10 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Why (not) use R?

Open Source & Programming needed

• No central authority

• No GUI (Graphical User Interface)

• e.g. offered by SPSS, SAS

Do you want to travel all-inclusive or as back-packer?

11 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Why (not) use R?

Open Source & Programming needed

• No central authority

• No GUI (Graphical User Interface)

• e.g. offered by SPSS, SAS

Do you want to travel all-inclusive or as back-packer?

11 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Today’s topicsIntroduction

What is R?First stepsExamples

Data types and structuresData typesVectorMatrixListsData.frames

Control structuresConditional expressionsLoops

Data HandlingRead/WriteData cleaning/formattingData Merge/Selection

Basic AnalysisPlottingStatisticsBeyond this course

12 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

OverviewIntroduction

What is R?First stepsExamplesData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course

13 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Installation on your own device:

Install R

• go to http://www.r-project.org/

• Download R for Windows / Mac / Linux

• Install ‘Base distribution’ with default settings

Install RStudio

• go to http://www.rstudio.org/

• Download and install with default

14 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Start R Studio

Explain R vs. ‘R Studio’. Small tour through RStudio.

15 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

R - Calculator

Math expressions

> 1 + 2 * 3

[1] 7

> 2 * 5^2 - 10 * 5 # a comment

[1] 0

> 4 * sin(pi / 2)

[1] 4

> 0 / 0 # not defined (Not a Number)

[1] NaN

16 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Arithmetic Operators, Functions, Values2

Operators, Functions, Values Description

∗, / multiplication, division+, - addition, subtraction%% modulo

max(), min() extremaabs() absolute valueround(), floor(), ceiling() round (up, down)sum(), prod() sum, productlog() logarithmsin(), cos() sinus, cosinus

pi the value of piInf, -Inf infinityNaN not defined (Not a Number)NA Not AvailableNULL empty set

2See R Development Core Team (2008): R: A Language and Environmentfor Statistical Computing. R Foundation For Statistical Computing, Vienna,Austria. URL http://www.R-project.org.

17 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Assignments

How to ‘remember’ calculations

> x1 <- 3.25 # assign the value 3.25 to object x1

> x1

[1] 3.25

# please don’t use ‘=’ or ‘->’ for assignments

# <<- ‘special’ assignment - kept for later

Declaration and initialisation at once.

18 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Assignments

How to ‘remember’ calculations

> x1 <- 3.25 # assign the value 3.25 to object x1

> x1

[1] 3.25

# please don’t use ‘=’ or ‘->’ for assignments

# <<- ‘special’ assignment - kept for later

Declaration and initialisation at once.

18 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Print function - print()

What appears on screen? Try

> x1 <- 3.25

> (x1 <- 3.25)

> print(x1 <- 3.25)

> x1

19 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Objects

R is an object oriented language

• Everything in R is an object (data, functions, etc.),

• objects have names (start with letter, upper-/lower-case),

• every object has a length(),

• every object is of a certain class - class(),

• for each class, there are special (generic) functions (e.g.print()),

• every object has a mode (data type) - mode(),

• an object may have attributes - attributes(),

• classes may inherit from other classes.

20 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Find Help

Integrated help

• help.start()

• help(”cor”), ?cor

• help.search(”correlation”), ??correlation

21 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Find Help

Internet

• check r-cran

• cran taskview

• cran discussions/mailing lists/examples

• manuals / vignettes

• R news/journal

• search for snippets - anything

22 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Find Help

Books

• see literature first slide and many more.

23 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

OverviewIntroduction

What is R?First stepsExamplesData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course

24 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Faithful - example

25 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Workspace and working directory

Workspace - save what?

• workspace keeps your R-objects

• workspace is located in your RAM

• different environments in your workspace

26 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Workspace and working directory

Workspace - try out

> ls()

?

> search()

?

> ls(name="package:datasets")

?

27 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Workspace and working directory

Working directory - default folder for save/load etc. - try out

> getwd()

[1] "/Users/moritz"

> setwd("/Users/moritz/Documents/Teaching

/2019_2020/Urfist")

28 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

OverviewWhat is R?First stepsExamples

Data types and structuresData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course

29 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Data types - mode()

Description Example Data type

Basic data types (hierarchical)

empty set NULL NULLboolean TRUE logical

integers & reals 3.14 numericcomplex numbers 2.13+1i complex

characters and strings ”Hello” character

Composite data type

factors blue & red factor

30 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Logical values in R

Try and explain

> 4 < 3

> (3+1) !=3

> -3<-2

> -3 < -2

> (3 >= 2) & (4 == (3+1))

> c(TRUE, FALSE) & c(TRUE, TRUE)

> x <- c(-4, -5, -1, 0, 2, 4, 5)

> x > 0

> sum(x > 0)

31 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Logical values in R

Function, Operator, Value Description

==, != equal, not equal>, >= greater, greater equal<, <= smaller, smaller equal! not

&, && and (vector, non-vector)|, || or (vector, non-vector)xor() exclusive or

TRUE, FALSE true, false

32 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Logical values in R

Try - any(), all(), which()

> x <- c(2, 0, 3, 1, 6, 9, 7)

> someTrue <- x > 4

> allTrue <- x >= 0

> any(someTrue)

> all(allTrue)

> any(allTrue)

> all(someTrue)

> !any(allTrue)

> all(!someTrue)

> all(!allTrue)

> which(someTrue)

> which(!someTrue) 33 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Logical values in R

Not available - NA - conversion into other data types

> x <- NA

> mode(x)

>

> y <- c(3, x)

> mode(y)

34 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Logical values in R

Not available - NA - is special

> # convert NA

> x <- NA

> mode(x)

> y <- c(3, x)

> mode(y)

> # comparison and calculation

> # - (if a part is unknown - result will be unknown)

> y > 2

> sum(y)

> # do not consider (remove) NA elements

> sum(y, na.rm=TRUE)

35 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Numeric values in R

Try - mode(), typeof()

> (x <- pi)

[1] 3.141593

> mode(x)

[1] "numeric"

> typeof(x)

[1] "double"

> (y <- as.integer(x)) # information loss

[1] 3

> typeof(y)

[1] "integer"

36 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Numeric values in R

Try - conversion

> is.character(y)

[1] FALSE

> x <- -1

> -1^0.5

[1] -1

> sqrt(as.complex(x))

[1] 0+1i

> (z <- as.character(y))

[1] "3"

> as.numeric("z")

[1] NA

Warning message:

NAs introduced by coercion

> (y <- as.numeric("-1"))

[1] -1

> y*5

[1] -5

37 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Character values in R

There are just strings. Manipulating them is another story. Nextcourse.

"hello world"

a <- "hello world"

38 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Factors in R

(ordered) factors

gulliver <- factor(c("dwarf","giant", "giant",

"giant","dwarf"))

gulliver <- factor(c("dwarf","giant", "giant",

"giant","dwarf"),

levels=c("giant","dwarf","gulliver"))

gulliver <- ordered(c("dwarf","giant", "giant",

"giant","dwarf"),

levels=c("dwarf","gulliver","giant"))

39 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Factors in R

(ordered) factors

> class(gulliver)

[1] "ordered" "factor"

> str(gulliver)

Ord.factor w/ 3 levels "dwarf"<"gulliver"<..: 1 3 3 3 1

> mode(gulliver)

[1] "numeric"

> length(gulliver)

[1] 5

40 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

OverviewWhat is R?First stepsExamples

Data types and structuresData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course

41 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Data structures

Data structure is the representation of data, which defines how weoperate on the data.

Data structures in R

• vector

• matrix

• array

• list

• data.frame

For each data structure you have a function to declare & initialisean object, e.g. vector() .

42 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Vector

characteristics of vectors

> x <- c(4.1, 4.3, 2, 54, 34) # c() for concatenate

>

> class(x)

[1] "numeric"

> mode(x)

[1] "numeric"

> typeof(x)

[1] "double"

> str(x)

num [1:5] 4.1 4.3 2 54 34

> is.vector(x)

[1] TRUE

> length(x)

[1] 5

> attributes(x)

NULL43 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Vector

characteristics of vectors

> # Only of one data type

> c(x, FALSE, TRUE)

[1] 4.1 4.3 2.0 54.0 34.0 0.0 1.0

> c(x, FALSE, TRUE, "Hello")

[1] "4.1" "4.3" "2" "54" "34" "FALSE" "TRUE" "Hello"

44 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Vector

characteristics of vectors

> x <- c(height=10, width=4)

> x

height width

10 4

> attributes(x)

$names

[1] "height" "width"

> names(x)

[1] "height" "width"

45 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Vector

sequences

> 2:5

[1] 2 3 4 5

> seq(2,5,by=2)

[1] 2 4

> rep(2:4,times=3)

[1] 2 3 4 2 3 4 2 3 4

46 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Vector

Calculate with vectors

> x <- c(2, 4, 8)

> x*2

[1] 4 8 16

> x-3

[1] -1 1 5

>

> x * c(3,2,1)

[1] 6 8 8

> x * c(3,2)

[1] 6 8 24

Warning message:

In x * c(3, 2) :

longer object length is not a multiple of shorter object length

47 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Vector

Calculate with vectors

> t(2:4) %*% 1:3

[,1]

[1,] 20

> 2:4 %*% 1:3

[,1]

[1,] 20

> 2:4 %*% t(1:3)

[,1] [,2] [,3]

[1,] 2 4 6

[2,] 3 6 9

[3,] 4 8 12

48 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Vector

Indexing vectors

> x

[1] 2 4 8

> x[2] # 2nd element by position

[1] 4

> x[c(FALSE,TRUE,FALSE)] # 2nd element with logical value

[1] 4

> names(x) <- c("Height","Length","Width")

> x["Length"] # access by name

Length

4

> x[-2] # without 2nd element

[1] 2 8

49 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Vector

Indexing vectors

# vector x

x <- c(2, 4, 8)

# extract by boolean vector

idx <- x < 5

print(idx)

x[idx]

# overwrite

x[idx] <- 2

# extract by position vector

pos <- which(x < 5)

print(pos)

x[pos]

# overwrite

x[pos] <- 150 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Vector

Indexing vectors

> x[3,2,1] # reverse order?

Error in x[3, 2, 1] : incorrect number of dimensions

> x[c(3,2,1)] # reverse order!

[1] 8 4 2

> x[] <- -2 # replace every element by -2

> x

[1] -2 -2 -2

> x <- -2 # overwrite x by -2

> x

[1] -2

51 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

OverviewWhat is R?First stepsExamples

Data types and structuresData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course

52 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Matrix

initialise a Matrix - try and explain

x <- 1:3

y <- 4:6

cbind(x,y)

rbind(x,y)

matrix(1:6, nrow=3, ncol=2)

matrix(1:6, nrow=3, byrow=TRUE)

matrix(1:6, ncol=2, byrow=TRUE)

matrix(1:6, nrow=3, ncol=4, byrow=TRUE)

53 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Matrix

Operate on matrices - try and explain

A <- matrix(c(1,1,2,2), nrow=2, byrow=TRUE)

B <- t(A) # transpose

solve(A) # inverse

# elementwise operations

A * B

A / B

A - B

A^B

A == B

A == t(B)

any(A==B)

# matrix multiplication

A %*% B54 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Matrix

Access matrix elements - exercise

>X <- matrix(1:6, nrow=3)

>rownames(X) <- c("length", "width", "height")

>colnames(X) <- c("shelf","chair")

• read length of shelf by [row pos., column pos.]

• read length of shelf by [row name, column name]

• read length of shelf by [row name, column pos.]

• overwrite length of shelf with 14

• get all dimensions of shelf

• get length of all mobiliar

• overwrite length of mobiliar by prior length times 10055 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Matrix

Matrix features

> class(X)

[1] "matrix"

> typeof(X)

[1] "integer"

> mode(X)

[1] "numeric"

56 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Matrix

Matrix features

> attributes(X)

$dim

[1] 3 2

$dimnames

$dimnames[[1]]

[1] "length" "width" "height"

$dimnames[[2]]

[1] "table" "chair"

57 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Matrix

Matrix features

> str(X)

int [1:3, 1:2] 1 2 3 4 5 6

- attr(*, "dimnames")=List of 2

..$ : chr [1:3] "length" "width" "height"

..$ : chr [1:2] "table" "chair"

Matrices are special cases of vectors in R (the opposite in math ofcourse).

58 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Array

Array - like matrix but possibly more dimensions

Avoid arrays.Don’t use arrays with more than 3 dimensions if not absolutelynecessary.Use map of array.

59 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

OverviewWhat is R?First stepsExamples

Data types and structuresData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course

60 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

List

List - features

• very flexible,

• may contain different data types,

• also lists (recursive)

61 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

List

List - features

> (L1 <- list("this is a", 100, matrix(1:4,nrow=2),

list(TRUE,FALSE)))

[[1]]

[1] "this is a"

[[2]]

[1] 100

[[3]]

[,1] [,2]

[1,] 1 3

[2,] 2 4

[[4]]

[[4]][[1]]

[1] TRUE

[[4]][[2]]

[1] FALSE

62 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

List

List - features

> (L1 <- list(a="this is a", b=100, c=matrix(1:4,nrow=2),

truth=list(TRUE,FALSE)))

$a

[1] "this is a"

$b

[1] 100

$c

[,1] [,2]

[1,] 1 3

[2,] 2 4

$truth

$truth[[1]]

[1] TRUE

$truth[[2]]

[1] FALSE

63 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

List

List - features

> L1[[1]] # access by position

[1] "this is a"

> L1[["a"]] # access by name

[1] "this is a"

>

> L1$b # access by name (but different)

[1] 100

64 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

List

List - features

> L1[[1]] <- NULL

> L1

$b

[1] 100

$c

[,1] [,2]

[1,] 1 3

[2,] 2 4

$truth

$truth[[1]]

[1] TRUE

$truth[[2]]

[1] FALSE65 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

OverviewWhat is R?First stepsExamples

Data types and structuresData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course

66 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

data.frames

data.frames

• the most typical structure for data sets

• data.frames are lists - but entries are vectors with same length

• data.frame()

67 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Data.frame

initialise Data.frame

> shopping <- data.frame(Product=c("cheese","wine","bread"),

Unit=c("grams","bottles","loaf"), Amount=c(300,2,2))

>

> str(shopping)

’data.frame’: 3 obs. of 3 variables:

$ Product: Factor w/ 3 levels "bread","cheese",..: 2 3 1

$ Unit : Factor w/ 3 levels "bottles","grams",..: 2 1 3

$ Amount : num 300 2 2

68 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Data.frame

Data.frame - access elements

> shopping[2,]

Product Unit Amount

2 wine bottles 2

> shopping[2,3]

[1] 2

> shopping[2,"Amount"]

[1] 2

> shopping$Amount[2]

[1] 2

69 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Data.frame

Plot two regimes of old faithful

40 50 60 70 80 90 100

12

34

56

waiting

eruptions

70 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Data.frame

Plot two regimes of old faithful (by subsetting data.frame)

# plot geysir data

plot(NA, xlim=c(40,100),ylim=c(1,6), xlab="waiting",

ylab="eruptions")

points(regime1[,"waiting"],regime1[,"eruptions"],col="red")

points(regime2[,"waiting"],regime2[,"eruptions"],col="blue")

points(rest[,"waiting"],rest[,"eruptions"],col="grey")

lines(x=c(70,70),y=c(0,7),lty=3)

lines(x=c(40,100),y=c(3.5,3.5),lty=3)

71 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Data.frame

Plot two regimes of old faithful (by adding variables todata.frame)

# plot geysir data

plot(faithful$waiting,faithful$eruptions, col=faithful$color,

xlim=c(40,100),ylim=c(1,6),

xlab="waiting", ylab="eruptions")

lines(x=c(70,70),y=c(0,7),lty=3)

lines(x=c(40,100),y=c(3.5,3.5),lty=3)

72 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

OverviewWhat is R?First stepsExamplesData typesVectorMatrixListsData.frames

Control structuresConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course

73 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Conditional expressions

If...else

> x <- 1

> if(x > 0){

+ print("x is pos.")

+ }else{

+ print("x is neg.")

+ }

[1] "x is pos."

74 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Conditional expressions

Ifelse

# vector-oriented, use for simple expr.

> x <- c(1,2,3,4)

> ifelse(x > 2, "A", "B")

[1] "B" "B" "A" "A"

75 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Conditional expressions

switch

> switch(2, a=11, b=12, cc=13, d=14)

[1] 12

> (switch("c", a=11, b=12, cc=13, d=14))

NULL

> switch("cc", a=11, b=12, cc=13, d=14)

[1] 13

76 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

OverviewWhat is R?First stepsExamplesData typesVectorMatrixListsData.frames

Control structuresConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course

77 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Loops

for()

> for(i in 1:3){

+ print(i)

+ }

[1] 1

[1] 2

[1] 3

78 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Loops

repeat

> i <- 0

> repeat{

+ i <- i+1

+ if(i < 3){

+ next # start next turn

+ }

+ print(i)

+

+ if(i == 3){

+ break # exit loop

+ }

+

+ }

[1] 379 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Loops

while

> i <- 0

> while(i < 3){

+ print(i)

+ i <- i + 1

+ }

[1] 0

[1] 1

[1] 2

80 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Loops in R

Short note on loops and efficiency

• R is executed by an interpreter (not compiled). Interpreterneeds to translate each loop into machine code.

• Vectorized code is typically faster (once translated, fetch,calculate write of each value ‘compressed’).

• maybe slower (if the size of the vector becomes too large),

• maybe difficult to read (if several objects are handled),

• maybe not possible (if sequential combination).

81 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Excercise

• Exercise ‘R shaked’

• Exercise ‘Normal Distribution uncommented’

82 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

OverviewWhat is R?First stepsExamplesData typesVectorMatrixListsData.framesConditional expressionsLoops

Data HandlingRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course

83 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Read/Write

Import/export, read/write

• R objects (load(), save())

• text-files (read.table() / write.table(), scan() / cat())

• operate on excel/access/data bases (e.g. RODBC, orRMySQL)

• SAS/SPSS/Stata (e.g. foreign)

See “R data import / export”.

84 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

OverviewWhat is R?First stepsExamplesData typesVectorMatrixListsData.framesConditional expressionsLoops

Data HandlingRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course

85 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Character string handling

Function Description

scan() / cat() Read from / print to file (or console)parse() / deparse() convert char into expression (and vice-versa)grep() regular expressionsnchar() number of chars in a stringstrsplit() / paste() (dis-)connect strings

See e.g. Sanchez, G. (2013) Handling and Processing Strings in RTrowchez Editions. Berkeley, 2013.http://www.gastonsanchez.com/Handling and Processing Stringsin R.pdf

86 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

OverviewWhat is R?First stepsExamplesData typesVectorMatrixListsData.framesConditional expressionsLoops

Data HandlingRead/WriteData cleaning/formattingData Merge/SelectionPlottingStatisticsBeyond this course

87 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Data Merge/Selection

How to join and split data.frames?

• merge(),

• rbind()

• cbind()

• subset()

• idx < − x=y; df[idx,c(”A”,”B”)]

Note: Merge/split of very large tables (millions of entries) better ina data base.

88 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Data Merge/Selection

How to join and split data.frames?

• merge(),

• rbind()

• cbind()

• subset()

• idx < − x=y; df[idx,c(”A”,”B”)]

Note: Merge/split of very large tables (millions of entries) better ina data base.

88 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

OverviewWhat is R?First stepsExamplesData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/Selection

Basic AnalysisPlottingStatisticsBeyond this course

89 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

R plotting

How to make good graphs

1. Excel plots

2. R plots - various devices - pdf, ps etc.

3. TikZ plots (vector graphics in LaTeX) - TikzDevice - NEW

4. Plotly - interactive web-based graphs via the open sourceJavaScript graphing library NEWER

90 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

R plotting

Devices

• Device defines where to plot, similar to paper, pergament etc.

• the same graph gets different ‘look and feels’ on differentdevices

• (transparent) colours differ by device

91 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

R plotting - pdf-device

●●

● ● ●

●●

●●

●●●● ●●● ●

●● ●●●

2000 2003 2006 2009

1020

30Regional spread of Greek GDP

Year

k E

uro

per

capi

ta

92 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Scientific plotting - tikzDevice

2000 2002 2004 2006 2008 2010

1015

2025

30

Regional spread of Greek GDP

Year

kE

uro

per

cap

ita

93 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

R plotting

Typical processStep 0: try out on the default device.

f <- paste(getwd(),"/GreekGDP.pdf",sep="") # define file

pdf(file=f, width=4, height=3) # open device

# plot into device

boxplot(gdp2[idx,"EUR_HAB"]/1000~gdp2[idx,"TIME"],notch=TRUE, xlab="Year", ylab="k Euro per capita",

main="Regional spread of Greek GDP")

dev.off() # close device

94 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

R plotting

Plot functions

• plot() - context dependent / generic function - depend on theclass to plot.

• add - points(), lines(), text() in the plot.

• barplot(), boxplot(), contour(), hist(), etc.

• plot functions from special packages

95 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

R plotting

• options in the function, e.g. cex

• options through par()

96 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

R plotting

Arguments frequently used in plot functions and par()Argument Description

axes (not) draw axes - later fine tunedcex size (multiply) of nodes and letterslty, lwd line type, line widthpch point symbolxlab, ylab axes labelsxlim, ylim axes dimensions

97 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

R plotting - regions

par()-ameters oma/mar3

3http://rgraphics.limnology.wisc.edu/rmargins sf.php98 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

R plotting - regions

par()-ameters mfcol/mfrow4

4http://rgraphics.limnology.wisc.edu/rmargins mfcol.php99 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

R plotting

General remarks

• Everything can be done,

• But may cost time to figure out how, and to code.

100 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

OverviewWhat is R?First stepsExamplesData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/Selection

Basic AnalysisPlottingStatisticsBeyond this course

101 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Descriptives

Checkoutsummary(), table(), prop.table(), cor(), rcorr(), etc.

102 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Descriptives - free work

What’s the chance of a happy end? Use the ‘Titanic’ data inthe library ‘datasets’.

103 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Descriptives - free work

Bring your project to the class room

YOUR PROJECT HERE.

104 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

OverviewWhat is R?First stepsExamplesData typesVectorMatrixListsData.framesConditional expressionsLoopsRead/WriteData cleaning/formattingData Merge/Selection

Basic AnalysisPlottingStatisticsBeyond this course

105 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Beyond this course

106 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Inductive stats

Checkoutlm(), glm(), task view

107 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Non-parametrics

Checkoutdensity() in R base, np-package, task view

108 / 109

Introduction Data types and structures Control structures Data Handling Basic Analysis

Bayesian

CheckoutrStan package, task view

109 / 109

Recommended