R language tutorial

Preview:

DESCRIPTION

R language tutorial

Citation preview

1Confidential | Copyright 2013 Trend Micro Inc.

David Chiu

R Language Tutorial

04/11/2023

Confidential | Copyright 2012 Trend Micro Inc.

Background of R

04/11/2023 2

Confidential | Copyright 2012 Trend Micro Inc.

What is R?

• GNU Project Developed by John Chambers @ Bell Lab

• Free software environment for statistical computing and graphics

• Functional programming language written primarily in C, Fortran

04/11/2023 3

R Language

• R is functional programming language

• R is an interpreted language

• R is object oriented-language

Why Using R

• Statistic analysis on the fly

• Mathematical function and graphic module embedded

• FREE! & Open Source! – http://cran.r-project.org/src/base/

Kaggle

http://www.kaggle.com/

R is the most widely language used by kaggle participants

Confidential | Copyright 2013 Trend Micro Inc.

Data Scientist of these Companies Using R

What is your programming language of choice, R, Python or something else?  

“I use R, and occasionally matlab, for data analysis. There is a large, active and extremely knowledgeable R community at Google.”http://simplystatistics.org/2013/02/15/interview-with-nick-chamandy-statistician-at-google/

04/11/2023 7

“Expert knowledge of SAS (With Enterprise Guide/Miner) required and candidates with strong knowledge of R will be preferred”http://www.kdnuggets.com/jobs/13/03-29-apple-sr-data-scientist.html?utm_source=twitterfeed&utm_medium=facebook&utm_campaign=tfb&utm_content=FaceBook&utm_term=analytics#.UVXibgXOpfc.facebook

Commercial support for R

• In 2007, Revolution Analytics providea commercial support for Revolution R

– http://www.revolutionanalytics.com/products/revolution-r.php– http://www.revolutionanalytics.com/why-revolution-r/which-r-is-right-for-me.php

• Big Data Appliance, which integrates R, Apache Hadoop, Oracle Enterprise Linux, and a NoSQL database with the Exadata hardware– http://

www.oracle.com/us/products/database/big-data-appliance/overview/index.html

Confidential | Copyright 2013 Trend Micro Inc.

Revolotion R

• Free for Community Version– http://www.revolutionanalytics.com/downloads/

– http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php

04/11/2023 9

  Base R 2.14.2 64

Revolution R (1-core)

Revolution R (4-core) Speedup (4 core)

Matrix Calculation 17.4 sec 2.9 sec 2.0 sec 7.9x

Matrix Functions 10.3 sec 2.0 sec 1.2 sec 7.8x

Program Control 2.7 sec 2.7 sec 2.7 sec Not Appreciable

Confidential | Copyright 2013 Trend Micro Inc.

IDE

R Studio

• http://www.rstudio.com/

04/11/2023 10

RGUI

• http://www.r-project.org/

Confidential | Copyright 2013 Trend Micro Inc.

Web App Development

Shiny makes it super simple for R users like you to turn analyses into interactive web applications that anyone can use

http://www.rstudio.com/shiny/

04/11/2023 11

Confidential | Copyright 2013 Trend Micro Inc.

Package Management

• CRAN (Comprehensive R Archive Network)

04/11/2023 12

Repository URLCRAN http://cran.r-project.org/web/packages/Bioconductor http://www.bioconductor.org/packages/release/Software.htmlR-Forge http://r-forge.r-project.org/

Confidential | Copyright 2012 Trend Micro Inc.

R Basic

04/11/2023 13

Confidential | Copyright 2013 Trend Micro Inc.

Basic Command

• help()– help(demo)

• demo()– demo(is.things)

• q()

• ls()

• rm()– rm(x)

04/11/2023 14

Confidential | Copyright 2013 Trend Micro Inc.

Basic Object

• Vector

• List

• Factor

• Array

• Matrix

• Data Frame

04/11/2023 15

Confidential | Copyright 2013 Trend Micro Inc.

Objects & Arithmetic

• Scalar– x=3; y<-5; x+y

• Vectors– x = c(1,2,3, 7); y= c(2,3,5,1); x+y; x*y; x – y; x/y;– x =seq(1,10); y= 2:11; x+y– x =seq(1,10,by=2); y =seq(1,10,length=2)– rep(c(5,8), 3)– x= c(1,2,3); length(x)

04/11/2023 16

Confidential | Copyright 2013 Trend Micro Inc.

Summaries and Subscripting

• Summary– X = c(1,2,3,4,5,6,7,8,9,10)– mean(x), min(x), median(x), max(x), var(x)– summary(x)

• Subscripting– x = c(1,2,3,4,5,6,7,8,9,10)– x[1:3]; x[c(1,3,5)];– x[c(1,3,5)] * 2 + x[c(2,2,2)]– x[-(1:6)]

04/11/2023 17

Lists

• Contain a heterogeneous selection of objects– e <- list(thing="hat", size="8.25"); e– l <- list(a=1,b=2,c=3,d=4,e=5,f=6,g=7,h=8,i=9,j=10)– l$j– man = list(name="Qoo", height=183); man$name

Confidential | Copyright 2013 Trend Micro Inc.

Factor

• Ordered collection of items to present categorical value

• Different values that the factor can take are called levels

• Factors– phone = factor(c('iphone', 'htc', 'iphone', 'samsung', 'iphone',

'samsung'))– levels(phone)

04/11/2023 19

Matrices & Array

• Array– An extension of a vector to more than two dimensions– a <- array(c(1,2,3,4,5,6,7,8,9,10,11,12),dim=c(3,4))

• Matrices– A vector to two dimensions – 2d-array– x = c(1,2,3); y = c(4,5,6); rbind(x,y);cbind(x,y)– x = rbind(c(1,2,3),c(4,5,6)); dim(x)– x<-matrix(c(1,2,3,4,5,6),nr=3); – x<-matrix(c(1,2,3,4,5,6),nrow=3, ,byrow=T)– x<-matrix(c(1,2,3,4),nr=2);y<-matrix(c(5,6),nr=2); x%*%y– t(matrix(c(1,2,3,4),nr=2))– solve(matrix(c(1,2,3,4),nr=2))

Data Frame

• Useful way to represent tabular data

• essentially a matrix with named columns may also include non-numerical variables

• Example– df = data.frame(a=c(1,2,3,4,5),b=c(2,3,4,5,6));df

Function

• Function– `%myop%` <- function(a, b) {2*a + 2*b}; 1 %myop% 1– f <- function(x) {return(x^2 + 3)}create.vector.of.ones <- function(n) {

return.vector <- NA; for (i in 1:n) { return.vector[i] <- 1; } return.vector;

} – create.vector.of.ones(3)

• Control Structures– If …else…– Repeat, for, while

• Catch error – trycatch

Anonymous Function

• Functional language Characteristic– apply.to.three <- function(f) {f(3)}– apply.to.three(function(x) {x * 7})

Objects and Classes

• All R code manipulates objects.

• Every object in R has a type

• In assignment statements, R will copy the object, not just the reference to the object Attributes

S3 & S4 Object

• Many R functions were implemented using S3 methods

• In S version 4 (hence S4), formal classes and methods were introduced that allowed – Multiple arguments– Abstract types– inheritance.

OOP of S4

• S4 OOP Example– setClass("Student", representation(name = "character",

score="numeric"))– studenta = new ("Student", name="david", score=80 )– studentb = new ("Student", name="andy", score=90 )setMethod("show", signature("Student"), function(object) { cat(object@score+100) })– setGeneric("getscore", function(object)

standardGeneric("getscore"))– Studenta

Packages

• A package is a related set of functions, help files, and data files that have been bundled together.

• Basic Command– library(rpart)– CRAN– Install– (.packages())

Confidential | Copyright 2013 Trend Micro Inc.

Package used in Machine Learning for Hackers

04/11/2023 28

Confidential | Copyright 2013 Trend Micro Inc.

Apply

• Apply– Returns a vector or array or list of values obtained by applying a

function to margins of an array or matrix.

– data <- cbind(c(1,2),c(3,4)) – data.rowsum <- apply(data,1,sum) – data.colsum <- apply(data,2,sum) – data

04/11/2023 29

Confidential | Copyright 2013 Trend Micro Inc.

Apply

• lapply – returns a list of the same length as X, each element of which is

the result of applying FUN to the corresponding element of X.

• sapply – is a user-friendly version and wrapper of lapply by default

returning a vector, matrix or

• vapply – is similar to sapply, but has a pre-specified type of return value,

so it can be safer (and sometimes faster) to use.

04/11/2023 30

File IO

• Save and Load– x = USPersonalExpenditure – save(x, file="~/test.RData") – rm(x) – load("~/test.RData") – x

Charts and Graphics

Plotting Example

– xrange = range(as.numeric(colnames(USPersonalExpenditure)));– yrange= range(USPersonalExpenditure);– plot(xrange, yrange, type="n", xlab="Year",ylab="Category" )

– for(i in 1:5) {

lines(as.numeric(colnames(USPersonalExpenditure)),USPersonalExpenditure[i,], type="b", lwd=1.5)

}

IRIS Dataset

• data()

Confidential | Copyright 2013 Trend Micro Inc.

Classification of IRIS

• Classification Example– install.packages("e1071")– pairs(iris[1:4],main="Iris Data

(red=setosa,green=versicolor,blue=virginica)", pch=21, bg=c("red","green3","blue")[unclass(iris$Species)])

– classifier<-naiveBayes(iris[,1:4], iris[,5])– table(predict(classifier, iris[,-5]), iris[,5])– classifier<-svm(iris[,1:4], iris[,5]) > table(predict(classifier, iris[,-

5]), iris[,5] + )– prediction = predict(classifier, iris[,1:4])

• http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Classification/Na%C3%AFve_Bayes

04/11/2023 36

Performance Tips

• Use Built-in Math Functions

• Use Environments for Lookup Tables

• Use a Database to Query Large Data Sets

• Preallocate Memory

• Monitor How Much Memory You Are Using

• Cleaning Up Objects

• Functions for Big Data Sets

• Parallel Computation with R

Confidential | Copyright 2012 Trend Micro Inc.

R for Machine Learning

04/11/2023 38

Helps of the Topic

• ?read.delim – # Access a function's help file

• ??base::delim – # Search for 'delim' in all help files for functions in 'base'

• help.search("delimited") – # Search for 'delimited' in all help files

• RSiteSearch("parsing text") – # Search for the term 'parsing text' on the R site.

Confidential | Copyright 2013 Trend Micro Inc.

Sample Code of Chapter 1

• https://github.com/johnmyleswhite/ML_for_Hackers.git

04/11/2023 40

Confidential | Copyright 2012 Trend Micro Inc.

Reference & Resource

04/11/2023 41

Confidential | Copyright 2013 Trend Micro Inc.

Study Material

• R in a nutshell

04/11/2023 42

Confidential | Copyright 2013 Trend Micro Inc.

Online Reference

04/11/2023 43

Confidential | Copyright 2013 Trend Micro Inc.

Community Resources for R help

04/11/2023 44

Confidential | Copyright 2013 Trend Micro Inc.

Resource

• Websites– Stackoverflow – Cross Validated– R-help– R-devel– R-sig-*– Package-specific mailing list

• Blog– R-bloggers

• Twitter– https://twitter.com/#rstats

• Quora– http://www.quora.com/R-software

04/11/2023 45

Confidential | Copyright 2013 Trend Micro Inc.

Resource (Con’d)

• Conference– useR!– R in Finance– R in Insurance– Others– Joint Statistical Meetings– Royal Statistical Society Conference

• Local User Group– http://blog.revolutionanalytics.com/local-r-groups.html

• Taiwan R User Group– http://www.facebook.com/Tw.R.User– http://www.meetup.com/Taiwan-R/

04/11/2023 46

04/11/2023 47Confidential | Copyright 2012 Trend Micro Inc.

Thank You!