28
Introduction to R PRISM - Nicolas Sutton-Charani 5 January 2021 Data science Advanced statistics 5 January 2021 1 / 28

Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Introduction to R

PRISM - Nicolas Sutton-Charani

5 January 2021

Data science Advanced statistics 5 January 2021 1 / 28

Page 2: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

1. Introduction

2. Software installation

3. Types and basics operations with R

4. Data import

5. Data simulation

6. Plots

7. Packages

8. Useful functions

Data science Advanced statistics 5 January 2021 2 / 28

Page 3: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Introduction

Plan

1. Introduction

2. Software installation

3. Types and basics operations with R

4. Data import

5. Data simulation

6. Plots

7. Packages

8. Useful functions

Data science Advanced statistics 5 January 2021 3 / 28

Page 4: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Introduction

What is R ?

Programming langage and statistical computing for data analysis

GNU package (C, Fortran)

freely available under GNU General Public License

collaborative project

Comprehensive R Archive Network (CRAN)

History

1975 : J. Chambers (Bell Laboratories) → S1995 : R. Ihaka and R. Gentleman (University of Auckland) → R

Data science Advanced statistics 5 January 2021 4 / 28

Page 5: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Introduction

Employment

1

1. https://r4stats.com/articles/popularity/

Data science Advanced statistics 5 January 2021 5 / 28

Page 6: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Introduction

R vs Python

2

2. http://r4stats.com/articles/popularity/

Data science Advanced statistics 5 January 2021 6 / 28

Page 7: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Introduction

Analytic tool

3

3. http://r4stats.com/articles/popularity/

Data science Advanced statistics 5 January 2021 7 / 28

Page 8: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Introduction

Analytic tool

4

4. http://r4stats.com/articles/popularity/

Data science Advanced statistics 5 January 2021 8 / 28

Page 9: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Software installation

Plan

1. Introduction

2. Software installation

3. Types and basics operations with R

4. Data import

5. Data simulation

6. Plots

7. Packages

8. Useful functions

Data science Advanced statistics 5 January 2021 9 / 28

Page 10: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Software installation

Software installation

R : the software

http://www.r-project.org/ → Download R→ choose one of the french mirrors→ Download R for Windows/Mac/Linux→ base→ Download R 4.0.3 for XXX

R studio : development environment

https://www.rstudio.com/ → Download→ RStudio Desktop→ choose correct installer

(windows/mac/linux)

Execute the 2 .exe files

Data science Advanced statistics 5 January 2021 10 / 28

Page 11: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Types and basics operations with R

Plan

1. Introduction

2. Software installation

3. Types and basics operations with R

4. Data import

5. Data simulation

6. Plots

7. Packages

8. Useful functions

Data science Advanced statistics 5 January 2021 11 / 28

Page 12: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Types and basics operations with R

Variables types

No type declaration ! → R-object assignment :

Vectors

Lists

Matrices

Arrays

Factors

Data frames

Data types :

logical (TRUE, FALSE)

numeric (ex : 12.3, 5, 999)

character (ex : ’a’ , ’”good”, ”TRUE”, ’23.4’) or factor when allmodalities are known

Data science Advanced statistics 5 January 2021 12 / 28

Page 13: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Types and basics operations with R

Operators

Arithmetic Comparison Logical+ addition < lesser than ! x logical NOT− subtraction > greater than x & y logical AND∗ multiplication <= lesser than or equal to x && y id./ division >= greater than or equal to x | y logical OR∧ power == equal x ‖ y id.

%% modulo ! = different xor(x, y) exclusive OR%/% integer division

Data science Advanced statistics 5 January 2021 13 / 28

Page 14: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Types and basics operations with R

Vectors

Values assignment : ’<-’ or ’=’ (ex : x <- 3 or x = 3)

Data generation

1 : 10[1] 1 2 3 4 5 6 7 8 9 10

seq(-3, +3, length = 13)[1] -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

v <- c(4, 10, 16)v[3]

[1] 16

Functions on vectors

mean(), sum(), median()

var() and sd()

length()

summary()

Data science Advanced statistics 5 January 2021 14 / 28

Page 15: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Types and basics operations with R

Matrices

m <- matrix(data = 1 : 12, nrow = 3, ncol = 4)m

[,1] [,2] [,3] [,4][1,] 1 4 7 10[2,] 2 5 8 11[3,] 3 6 9 12

m[2, 3][1] 8

dim(m)[1] 3 4

cbind(m, v)v

[1,] 1 4 7 10 4[2,] 2 5 8 11 10[3,] 3 6 9 12 16

rbind(m, c(v, 5))[,1] [,2] [,3] [,4]

[1,] 1 4 7 10[2,] 2 5 8 11[3,] 3 6 9 12[4,] 4 10 16 5

Data science Advanced statistics 5 January 2021 15 / 28

Page 16: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Types and basics operations with R

Dataframes

special type of matrix

mixed types of data

nominal column indexing → insensitive to data reordering !

df <- data.frame(id = c(”id1”, ”id2”, ”id3”, ”id4”, ”id5”),poids = c(85, 78, 56, 102, 91),taille = c(170, 176, 155, 187, 202))

dfid poids taille

1 id1 85 1702 id2 78 1763 id3 56 1554 id4 102 1875 id5 91 202

call columns by their names : df$poids (vector) or df[’poids’](dataframe)

Data science Advanced statistics 5 January 2021 16 / 28

Page 17: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Types and basics operations with R

Dataframes

List of equal-sized vector containing different variable types→ function ’class’ :

class(df)[1] ”data.frame”

class(df$taille)[1] ”numeric”

class(df$id)[1] ”factor”

Variable names → selection, filter :

df1 <- subset(df, select = c(id, taille)) ⇔ df1 <- subset(df, select = - poids)

df2 <- df[df$poids > 80, ]

Data science Advanced statistics 5 January 2021 17 / 28

Page 18: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Data import

Plan

1. Introduction

2. Software installation

3. Types and basics operations with R

4. Data import

5. Data simulation

6. Plots

7. Packages

8. Useful functions

Data science Advanced statistics 5 January 2021 18 / 28

Page 19: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Data import

Import

from a text or csv file :df <- read.table(file = ”file.txt”, sep = ” ;”, dec = ”,”, header = TRUE)

from a csv file : df <- read.csv(file = ”file.csv”)

from an excel file :library(xlsx)

df <- read.xlsx(”my file.xls”, sheetIndex = 2)

from a database :library(RODBC)connexion <- odbcDriverConnect(’driver = SQL Server ;

server = mysqlhost ;database = mydbname ;trusted connection = true’)

df <- sqlQuery(connexion, ’SELECT * FROM information schema.tables’)

odbcClose(channel)

Data science Advanced statistics 5 January 2021 19 / 28

Page 20: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Data simulation

Plan

1. Introduction

2. Software installation

3. Types and basics operations with R

4. Data import

5. Data simulation

6. Plots

7. Packages

8. Useful functions

Data science Advanced statistics 5 January 2021 20 / 28

Page 21: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Data simulation

Simulations

Simple randomness : n <- sample(N, size = 7, replace = FALSE)

Random generation

Normal (Gauss) : v <- rnorm(n, mean = 0, sd = 1)

Poisson : v <- rpois(n, lambda)

Binomial : v <- rbinom(n, size, prob)

...

Probability corresponding distributions : dnorm, dpois, dbinom, ...

Data science Advanced statistics 5 January 2021 21 / 28

Page 22: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Plots

Plan

1. Introduction

2. Software installation

3. Types and basics operations with R

4. Data import

5. Data simulation

6. Plots

7. Packages

8. Useful functions

Data science Advanced statistics 5 January 2021 22 / 28

Page 23: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Plots

Plotsx <- seq(-10, +10, length = 10000)y <- cos(x)z <- dnorm(x)

plot(x, y)

−10 −5 0 5 10

−1.

0−

0.5

0.0

0.5

1.0

x

cos(

x)

plot(x, z, main = ”Normal distribution”,cex.main = 3, font.main = 6,xlab = ”x”, ylab = ”f(x)”, pch = ”+”,

cex.axis = 1.5, cex.lab = 1.5, col = ”red”)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++++++++++++++++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

−10 −5 0 5 10

0.0

0.1

0.2

0.3

0.4

Normal distribution

x

f(x)

Data science Advanced statistics 5 January 2021 23 / 28

Page 24: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Packages

Plan

1. Introduction

2. Software installation

3. Types and basics operations with R

4. Data import

5. Data simulation

6. Plots

7. Packages

8. Useful functions

Data science Advanced statistics 5 January 2021 24 / 28

Page 25: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Packages

Packages

Installation : install.packages(”<the package’s name>”,repos=’http ://cran.us.r-project.org’)Loading : library(”<the package’s name>”)

package name descriptionggplot advances plottingMASS statistical toolsmatlab use matlab codedplyr data manipulationdoParallel parallelisationcaret machine learninge1071 SVMshiny interfacing...

...

Data science Advanced statistics 5 January 2021 25 / 28

Page 26: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Useful functions

Plan

1. Introduction

2. Software installation

3. Types and basics operations with R

4. Data import

5. Data simulation

6. Plots

7. Packages

8. Useful functions

Data science Advanced statistics 5 January 2021 26 / 28

Page 27: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Useful functions

Useful functions

grep :grep(”mtpl”, c(”PSG”, ”OL”, ”MSCHmtplPro”, ”GazelecAjac”))[1] 3

apply :df <- data.frame(A = c(”hello”, ”bye”, ”thanks”),+ B = 1 : 3,+ C = c(T, F, F))sapply(df, class)

A B C”factor” ”integer” ”logical”

Data science Advanced statistics 5 January 2021 27 / 28

Page 28: Introduction to R - WordPress.com · 2021. 1. 5. · Introduction Plan 1. Introduction 2. Software installation 3. Types and basics operations with R 4. Data import 5. Data simulation

Useful functions

Useful functions

cat, paste :n <- 10cat(paste(”run number”, n))run number 10

system.time :learCT <- system.time(

svm <- svm(target ∼ ., data = trainData)

head/tail

which

Data science Advanced statistics 5 January 2021 28 / 28