28

Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Initiation to R

Nicolas Sutton-Charani

Initiation to R 2019-04-29 1 / 28

Page 2: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

1. Introduction

2. Software installation

3. Types and basics operations with R

4. Data import

5. Data simulation

6. Plots

7. Packages

8. Useful functions

Initiation to R 2019-04-29 2 / 28

Page 3: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Introduction

Plan

1. Introduction

2. Software installation

3. Types and basics operations with R

4. Data import

5. Data simulation

6. Plots

7. Packages

8. Useful functions

Initiation to R 2019-04-29 3 / 28

Page 4: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Introduction

What is R ?

Programming langage and statistical computing for data analysis

GNU package (C, Fortran)

freely available under GNU General Public License

collaborative project

Comprehensive R Archive Network (CRAN)

History

1975 : J. Chambers (Bell Laboratories) → S1995 : R. Ihaka and R. Gentleman (University of Auckland) → R

Initiation to R 2019-04-29 4 / 28

Page 5: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Introduction

Employment

1

1. http://r4stats.com/articles/popularity/

Initiation to R 2019-04-29 5 / 28

Page 6: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Introduction

R vs Python

2

2. http://r4stats.com/articles/popularity/

Initiation to R 2019-04-29 6 / 28

Page 7: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Introduction

Analytic tool

3

3. http://r4stats.com/articles/popularity/

Initiation to R 2019-04-29 7 / 28

Page 8: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Introduction

Analytic tool

4

4. http://r4stats.com/articles/popularity/

Initiation to R 2019-04-29 8 / 28

Page 9: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Software installation

Plan

1. Introduction

2. Software installation

3. Types and basics operations with R

4. Data import

5. Data simulation

6. Plots

7. Packages

8. Useful functions

Initiation to R 2019-04-29 9 / 28

Page 10: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Software installation

Software installation

R : the software

http://www.r-project.org/ → CRAN → choose one of the frenchmirrors → Download R for Windows/Mac/Linux → base → Download R3.5.1 for XXX

R studio : development environment

https://www.rstudio.com/ → Download Rstudio → RStudio DesktopOpen Source License : Download → choose correct installer

Execute the 2 .exe �les

Initiation to R 2019-04-29 10 / 28

Page 11: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Types and basics operations with R

Plan

1. Introduction

2. Software installation

3. Types and basics operations with R

4. Data import

5. Data simulation

6. Plots

7. Packages

8. Useful functions

Initiation to R 2019-04-29 11 / 28

Page 12: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Types and basics operations with R

Variables types

No type declaration ! → R-object assignment :

Vectors

Lists

Matrices

Arrays

Factors

Data Frames

Data types :

logical (TRUE, FALSE)

numeric (ex : 12.3, 5, 999)

character (ex : "a" , "good", "TRUE", "23.4") or factor when allmodalities are known

Initiation to R 2019-04-29 12 / 28

Page 13: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Types and basics operations with R

Operators

Arithmetic Comparison Logical+ addition < lesser than ! x logical NOT− subtraction > greater than x & y logical AND∗ multiplication <= lesser than or equal to x && y id./ division >= greater than or equal to x | y logical OR∧ power == equal x ‖ y id.

%% modulo ! = di�erent xor(x, y) exclusive OR%/% integer division

Initiation to R 2019-04-29 13 / 28

Page 14: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Types and basics operations with R

Vectors

Values assignment : '<-' or '=' (ex : x <- 3 or x = 3)

Data generation

1 : 10[1] 1 2 3 4 5 6 7 8 9 10

seq(-3, +3, length = 13)[1] -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

v <- c(4, 10, 16)v[3]

[1] 16

Functions on vectors

mean(), sum(), median()

var() and sd()

length()

summary()

Initiation to R 2019-04-29 14 / 28

Page 15: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Types and basics operations with R

Matrices

m <- matrix(data = 1 : 12, nrow = 3, ncol = 4)m

[,1] [,2] [,3] [,4][1,] 1 4 7 10[2,] 2 5 8 11[3,] 3 6 9 12

m[2, 3][1] 8

dim(m)[1] 3 4

cbind(m, v)v

[1,] 1 4 7 10 4[2,] 2 5 8 11 10[3,] 3 6 9 12 16

rbind(m, c(v, 5))[,1] [,2] [,3] [,4]

[1,] 1 4 7 10[2,] 2 5 8 11[3,] 3 6 9 12[4,] 4 10 16 5

Initiation to R 2019-04-29 15 / 28

Page 16: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Types and basics operations with R

Dataframes

special type of matrix

mixed types of data

nominal column indexing → insensitive to data reordering !

df <- data.frame(id = c("id1", "id2", "id3", "id4", "id5"),poids = c(85, 78, 56, 102, 91),taille = c(170, 176, 155, 187, 202))

dfid poids taille

1 id1 85 1702 id2 78 1763 id3 56 1554 id4 102 1875 id5 91 202

call columns by their names : df$poids (vector) or df['poids'](dataframe)

Initiation to R 2019-04-29 16 / 28

Page 17: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Types and basics operations with R

Dataframes

List of equal-sized vector containing di�erent variable types→ function 'class' :

class(df)[1] "data.frame"

class(df$taille)[1] "numeric"

class(df$id)[1] "factor"

Variable names → selection, �lter :

df1 <- subset(df, select = c(id, taille)) ⇔ df1 <- subset(df, select = - poids)

df2 <- df[df$poids > 80, ]

Initiation to R 2019-04-29 17 / 28

Page 18: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Data import

Plan

1. Introduction

2. Software installation

3. Types and basics operations with R

4. Data import

5. Data simulation

6. Plots

7. Packages

8. Useful functions

Initiation to R 2019-04-29 18 / 28

Page 19: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Data import

Import

from a text or csv �le :df <- read.table(�le = "�le.txt", sep = " ;", dec = ",", header = TRUE)

from a csv �le : df <- read.csv(�le = "�le.csv")

from an excel �le :library(readxl)

df <- read_excel("my_�le.xls")

from a database :library(RODBC)connexion <- odbcDriverConnect('driver = SQL Server ;

server = mysqlhost ;database = mydbname ;trusted_connection = true')

df <- sqlQuery(connexion, 'SELECT * FROM information_schema.tables')

odbcClose(channel)

Initiation to R 2019-04-29 19 / 28

Page 20: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Data simulation

Plan

1. Introduction

2. Software installation

3. Types and basics operations with R

4. Data import

5. Data simulation

6. Plots

7. Packages

8. Useful functions

Initiation to R 2019-04-29 20 / 28

Page 21: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Data simulation

Simulations

Simple randomness : n <- sample(N, size = 7, replace = FALSE)

Random generation

Normal (Gauss) : v <- rnorm(n, mean = 0, sd = 1)

Poisson : v <- rpois(n, lambda)

Binomial : v <- rbinom(n, size, prob)

...

Probability corresponding distributions : dnorm, dpois, dbinom, ...

Initiation to R 2019-04-29 21 / 28

Page 22: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Plots

Plan

1. Introduction

2. Software installation

3. Types and basics operations with R

4. Data import

5. Data simulation

6. Plots

7. Packages

8. Useful functions

Initiation to R 2019-04-29 22 / 28

Page 23: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Plots

Plots

x <- seq(-10, +10, length = 10000)y <- cos(x)z <- dnorm(x)

plot(x, y)

−10 −5 0 5 10

−1.

0−

0.5

0.0

0.5

1.0

x

cos(

x)

plot(x, z, main = "Normal distribution",cex.main = 3, font.main = 6,xlab = "x", ylab = "f(x)", pch = "+",

cex.axis = 1.5, cex.lab = 1.5, col = "red")

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++++++++++++++++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

−10 −5 0 5 10

0.0

0.1

0.2

0.3

0.4

Normal distribution

x

f(x)

Initiation to R 2019-04-29 23 / 28

Page 24: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Packages

Plan

1. Introduction

2. Software installation

3. Types and basics operations with R

4. Data import

5. Data simulation

6. Plots

7. Packages

8. Useful functions

Initiation to R 2019-04-29 24 / 28

Page 25: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Packages

Packages

Installation : install.packages("<the package's name>",repos='http ://cran.us.r-project.org')Loading : library(<the package's name>)

package name description

ggplot advances plottingMASS statistical toolsmatlab use matlab codedplyr data manipulationdoParallel parallelisationcaret machine learninge1071 SVMshiny interfacing...

...

Initiation to R 2019-04-29 25 / 28

Page 26: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Useful functions

Plan

1. Introduction

2. Software installation

3. Types and basics operations with R

4. Data import

5. Data simulation

6. Plots

7. Packages

8. Useful functions

Initiation to R 2019-04-29 26 / 28

Page 27: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Useful functions

Useful functions

grep :grep("mtpl", c("PSG", "OL", "MSCHmtplPro", "GazélecAjac"))[1] 3

apply :df <- data.frame(A = c('hello', 'bye', 'thanks'),+ B = 1 : 3,+ C = c(T, F, F))sapply(df, class)

A B C"factor" "integer" "logical"

Initiation to R 2019-04-29 27 / 28

Page 28: Initiation to R · Initiation to R 2019-04-29 10/28. ypTes and basics operations with R Plan 1. Introduction 2. Software installation 3. ypTes and basics operations with R 4. Data

Useful functions

Useful functions

cat, paste :n <- 10cat(paste("run number", n))run number 10

system.time :learCT <- system.time(svm <- svm(target ∼ ., data = trainData)

)

head/tail

which

Initiation to R 2019-04-29 28 / 28