47
David Chiu R Language Tutorial 1 10/30/2022 Confidential | Copyright 2013 Trend Micro Inc.

R language tutorial

Embed Size (px)

DESCRIPTION

R language tutorial

Citation preview

Page 1: R language tutorial

1Confidential | Copyright 2013 Trend Micro Inc.

David Chiu

R Language Tutorial

04/11/2023

Page 2: R language tutorial

Confidential | Copyright 2012 Trend Micro Inc.

Background of R

04/11/2023 2

Page 3: R language tutorial

Confidential | Copyright 2012 Trend Micro Inc.

What is R?

• GNU Project Developed by John Chambers @ Bell Lab

• Free software environment for statistical computing and graphics

• Functional programming language written primarily in C, Fortran

04/11/2023 3

Page 4: R language tutorial

R Language

• R is functional programming language

• R is an interpreted language

• R is object oriented-language

Page 5: R language tutorial

Why Using R

• Statistic analysis on the fly

• Mathematical function and graphic module embedded

• FREE! & Open Source! – http://cran.r-project.org/src/base/

Page 6: R language tutorial

Kaggle

http://www.kaggle.com/

R is the most widely language used by kaggle participants

Page 7: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Data Scientist of these Companies Using R

What is your programming language of choice, R, Python or something else?  

“I use R, and occasionally matlab, for data analysis. There is a large, active and extremely knowledgeable R community at Google.”http://simplystatistics.org/2013/02/15/interview-with-nick-chamandy-statistician-at-google/

04/11/2023 7

“Expert knowledge of SAS (With Enterprise Guide/Miner) required and candidates with strong knowledge of R will be preferred”http://www.kdnuggets.com/jobs/13/03-29-apple-sr-data-scientist.html?utm_source=twitterfeed&utm_medium=facebook&utm_campaign=tfb&utm_content=FaceBook&utm_term=analytics#.UVXibgXOpfc.facebook

Page 8: R language tutorial

Commercial support for R

• In 2007, Revolution Analytics providea commercial support for Revolution R

– http://www.revolutionanalytics.com/products/revolution-r.php– http://www.revolutionanalytics.com/why-revolution-r/which-r-is-right-for-me.php

• Big Data Appliance, which integrates R, Apache Hadoop, Oracle Enterprise Linux, and a NoSQL database with the Exadata hardware– http://

www.oracle.com/us/products/database/big-data-appliance/overview/index.html

Page 9: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Revolotion R

• Free for Community Version– http://www.revolutionanalytics.com/downloads/

– http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php

04/11/2023 9

  Base R 2.14.2 64

Revolution R (1-core)

Revolution R (4-core) Speedup (4 core)

Matrix Calculation 17.4 sec 2.9 sec 2.0 sec 7.9x

Matrix Functions 10.3 sec 2.0 sec 1.2 sec 7.8x

Program Control 2.7 sec 2.7 sec 2.7 sec Not Appreciable

Page 10: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

IDE

R Studio

• http://www.rstudio.com/

04/11/2023 10

RGUI

• http://www.r-project.org/

Page 11: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Web App Development

Shiny makes it super simple for R users like you to turn analyses into interactive web applications that anyone can use

http://www.rstudio.com/shiny/

04/11/2023 11

Page 12: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Package Management

• CRAN (Comprehensive R Archive Network)

04/11/2023 12

Repository URLCRAN http://cran.r-project.org/web/packages/Bioconductor http://www.bioconductor.org/packages/release/Software.htmlR-Forge http://r-forge.r-project.org/

Page 13: R language tutorial

Confidential | Copyright 2012 Trend Micro Inc.

R Basic

04/11/2023 13

Page 14: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Basic Command

• help()– help(demo)

• demo()– demo(is.things)

• q()

• ls()

• rm()– rm(x)

04/11/2023 14

Page 15: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Basic Object

• Vector

• List

• Factor

• Array

• Matrix

• Data Frame

04/11/2023 15

Page 16: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Objects & Arithmetic

• Scalar– x=3; y<-5; x+y

• Vectors– x = c(1,2,3, 7); y= c(2,3,5,1); x+y; x*y; x – y; x/y;– x =seq(1,10); y= 2:11; x+y– x =seq(1,10,by=2); y =seq(1,10,length=2)– rep(c(5,8), 3)– x= c(1,2,3); length(x)

04/11/2023 16

Page 17: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Summaries and Subscripting

• Summary– X = c(1,2,3,4,5,6,7,8,9,10)– mean(x), min(x), median(x), max(x), var(x)– summary(x)

• Subscripting– x = c(1,2,3,4,5,6,7,8,9,10)– x[1:3]; x[c(1,3,5)];– x[c(1,3,5)] * 2 + x[c(2,2,2)]– x[-(1:6)]

04/11/2023 17

Page 18: R language tutorial

Lists

• Contain a heterogeneous selection of objects– e <- list(thing="hat", size="8.25"); e– l <- list(a=1,b=2,c=3,d=4,e=5,f=6,g=7,h=8,i=9,j=10)– l$j– man = list(name="Qoo", height=183); man$name

Page 19: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Factor

• Ordered collection of items to present categorical value

• Different values that the factor can take are called levels

• Factors– phone = factor(c('iphone', 'htc', 'iphone', 'samsung', 'iphone',

'samsung'))– levels(phone)

04/11/2023 19

Page 20: R language tutorial

Matrices & Array

• Array– An extension of a vector to more than two dimensions– a <- array(c(1,2,3,4,5,6,7,8,9,10,11,12),dim=c(3,4))

• Matrices– A vector to two dimensions – 2d-array– x = c(1,2,3); y = c(4,5,6); rbind(x,y);cbind(x,y)– x = rbind(c(1,2,3),c(4,5,6)); dim(x)– x<-matrix(c(1,2,3,4,5,6),nr=3); – x<-matrix(c(1,2,3,4,5,6),nrow=3, ,byrow=T)– x<-matrix(c(1,2,3,4),nr=2);y<-matrix(c(5,6),nr=2); x%*%y– t(matrix(c(1,2,3,4),nr=2))– solve(matrix(c(1,2,3,4),nr=2))

Page 21: R language tutorial

Data Frame

• Useful way to represent tabular data

• essentially a matrix with named columns may also include non-numerical variables

• Example– df = data.frame(a=c(1,2,3,4,5),b=c(2,3,4,5,6));df

Page 22: R language tutorial

Function

• Function– `%myop%` <- function(a, b) {2*a + 2*b}; 1 %myop% 1– f <- function(x) {return(x^2 + 3)}create.vector.of.ones <- function(n) {

return.vector <- NA; for (i in 1:n) { return.vector[i] <- 1; } return.vector;

} – create.vector.of.ones(3)

• Control Structures– If …else…– Repeat, for, while

• Catch error – trycatch

Page 23: R language tutorial

Anonymous Function

• Functional language Characteristic– apply.to.three <- function(f) {f(3)}– apply.to.three(function(x) {x * 7})

Page 24: R language tutorial

Objects and Classes

• All R code manipulates objects.

• Every object in R has a type

• In assignment statements, R will copy the object, not just the reference to the object Attributes

Page 25: R language tutorial

S3 & S4 Object

• Many R functions were implemented using S3 methods

• In S version 4 (hence S4), formal classes and methods were introduced that allowed – Multiple arguments– Abstract types– inheritance.

Page 26: R language tutorial

OOP of S4

• S4 OOP Example– setClass("Student", representation(name = "character",

score="numeric"))– studenta = new ("Student", name="david", score=80 )– studentb = new ("Student", name="andy", score=90 )setMethod("show", signature("Student"), function(object) { cat(object@score+100) })– setGeneric("getscore", function(object)

standardGeneric("getscore"))– Studenta

Page 27: R language tutorial

Packages

• A package is a related set of functions, help files, and data files that have been bundled together.

• Basic Command– library(rpart)– CRAN– Install– (.packages())

Page 28: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Package used in Machine Learning for Hackers

04/11/2023 28

Page 29: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Apply

• Apply– Returns a vector or array or list of values obtained by applying a

function to margins of an array or matrix.

– data <- cbind(c(1,2),c(3,4)) – data.rowsum <- apply(data,1,sum) – data.colsum <- apply(data,2,sum) – data

04/11/2023 29

Page 30: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Apply

• lapply – returns a list of the same length as X, each element of which is

the result of applying FUN to the corresponding element of X.

• sapply – is a user-friendly version and wrapper of lapply by default

returning a vector, matrix or

• vapply – is similar to sapply, but has a pre-specified type of return value,

so it can be safer (and sometimes faster) to use.

04/11/2023 30

Page 31: R language tutorial

File IO

• Save and Load– x = USPersonalExpenditure – save(x, file="~/test.RData") – rm(x) – load("~/test.RData") – x

Page 32: R language tutorial

Charts and Graphics

Page 33: R language tutorial

Plotting Example

– xrange = range(as.numeric(colnames(USPersonalExpenditure)));– yrange= range(USPersonalExpenditure);– plot(xrange, yrange, type="n", xlab="Year",ylab="Category" )

– for(i in 1:5) {

lines(as.numeric(colnames(USPersonalExpenditure)),USPersonalExpenditure[i,], type="b", lwd=1.5)

}

Page 34: R language tutorial

IRIS Dataset

• data()

Page 36: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Classification of IRIS

• Classification Example– install.packages("e1071")– pairs(iris[1:4],main="Iris Data

(red=setosa,green=versicolor,blue=virginica)", pch=21, bg=c("red","green3","blue")[unclass(iris$Species)])

– classifier<-naiveBayes(iris[,1:4], iris[,5])– table(predict(classifier, iris[,-5]), iris[,5])– classifier<-svm(iris[,1:4], iris[,5]) > table(predict(classifier, iris[,-

5]), iris[,5] + )– prediction = predict(classifier, iris[,1:4])

• http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Classification/Na%C3%AFve_Bayes

04/11/2023 36

Page 37: R language tutorial

Performance Tips

• Use Built-in Math Functions

• Use Environments for Lookup Tables

• Use a Database to Query Large Data Sets

• Preallocate Memory

• Monitor How Much Memory You Are Using

• Cleaning Up Objects

• Functions for Big Data Sets

• Parallel Computation with R

Page 38: R language tutorial

Confidential | Copyright 2012 Trend Micro Inc.

R for Machine Learning

04/11/2023 38

Page 39: R language tutorial

Helps of the Topic

• ?read.delim – # Access a function's help file

• ??base::delim – # Search for 'delim' in all help files for functions in 'base'

• help.search("delimited") – # Search for 'delimited' in all help files

• RSiteSearch("parsing text") – # Search for the term 'parsing text' on the R site.

Page 40: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Sample Code of Chapter 1

• https://github.com/johnmyleswhite/ML_for_Hackers.git

04/11/2023 40

Page 41: R language tutorial

Confidential | Copyright 2012 Trend Micro Inc.

Reference & Resource

04/11/2023 41

Page 42: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Study Material

• R in a nutshell

04/11/2023 42

Page 43: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Online Reference

04/11/2023 43

Page 44: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Community Resources for R help

04/11/2023 44

Page 45: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Resource

• Websites– Stackoverflow – Cross Validated– R-help– R-devel– R-sig-*– Package-specific mailing list

• Blog– R-bloggers

• Twitter– https://twitter.com/#rstats

• Quora– http://www.quora.com/R-software

04/11/2023 45

Page 46: R language tutorial

Confidential | Copyright 2013 Trend Micro Inc.

Resource (Con’d)

• Conference– useR!– R in Finance– R in Insurance– Others– Joint Statistical Meetings– Royal Statistical Society Conference

• Local User Group– http://blog.revolutionanalytics.com/local-r-groups.html

• Taiwan R User Group– http://www.facebook.com/Tw.R.User– http://www.meetup.com/Taiwan-R/

04/11/2023 46

Page 47: R language tutorial

04/11/2023 47Confidential | Copyright 2012 Trend Micro Inc.

Thank You!