58

Machine learning with R

Embed Size (px)

Citation preview

Page 1: Machine learning with R
Page 2: Machine learning with R

Machine learning with RAMIS Day April 3rd 2017

Maarten Smeets

Page 3: Machine learning with R

MACHINE LEARNING WITH R

WHAT IS MACHINE LEARNING USE CASES FOR MACHINE LEARNING

SUPERVISED LEARNING

UNSUPERVISED LEARNING INTRODUCING R

COOL FEATURES OF R R AND ORACLE

Page 4: Machine learning with R

MACHINE LEARNING

• Machine learning is the subfield of computer science that gives computers the ability to learn without being explicitly programmed.

Page 5: Machine learning with R

MACHINE LEARNINGUSE CASES

• E-mail categorizationSpam, News, Personal, Orders, …

• Anomaly detectionFraud detection, behavior which does not fit known classifications well

• Optical Character recognition (OCR)

• GeneticsWill you have a high change of relapse when you have this cancer type and these genes?

Page 6: Machine learning with R

MACHINE LEARNINGUSE CASES

• Log file analysisWhich entries are rare? Which are the variables in a log line?Intruder detection

• IoTSelf learning thermostats

• Predict weatherBased on environmental measures like humidity, air pressure, satellite images

• Detect trendsThe number of cases present in the KEI system at Spir-it and performance

• Image recognitionSelf driving cars like Tesla, BMW

• Predict stock pricesFind correlations between stocks and try to find features which can predict future prices

Page 7: Machine learning with R

1 2

WHAT IS MACHINE LEARNING

Supervised learning Unsupervised learning

Page 8: Machine learning with R

SUPERVISED LEARNING

• The computer is presented with input and desired output

• The goal is to derive a general ruleset to map input to output

• This ruleset can be used to do predictions of output based on input

Page 9: Machine learning with R

SUPERVISED LEARNINGEXAMPLES

• Linear regression

• Support Vector Regression

• Random forest

• Artificial Neural Networks (ANN)

Page 10: Machine learning with R

SUPERVISED LEARNINGLINEAR REGRESSION

Data

Statistics

Plot

Page 11: Machine learning with R

SUPERVISED LEARNINGSUPPORT VECTOR REGRESSION

Page 12: Machine learning with R

SUPERVISED LEARNINGSUPPORT VECTOR REGRESSION

http://www.svm-tutorial.com/2014/10/support-vector-regression-r/

Prediction with tuned model

Page 13: Machine learning with R

SUPERVISED LEARNINGRANDOM FOREST

Page 14: Machine learning with R

SUPERVISED LEARNINGRANDOM FOREST

• Features are used to classify data

• A set of decision trees are generated based on 2 sets of random features

• Every tree sees a subset of the data

• Splits in the tree are determined by training data valueswhere does a split add most information

• To do predictions, features are put through all decision trees and the result classifications are given a weight

Page 15: Machine learning with R

SUPERVISED LEARNINGRANDOM FOREST

Page 16: Machine learning with R

SUPERVISED LEARNINGRANDOM FOREST

Page 17: Machine learning with R

SUPERVISED LEARNINGRANDOM FOREST

Variable importance plot

Mainly Y was used in the decision trees to determine the outcome

i (a counter) was not important

Page 18: Machine learning with R

SUPERVISED LEARNINGRANDOM FOREST

• Why is it very useful?• Data does not have many requirements• Can deal with multiple dimensions• Does good predictions in a lot of cases• Fast• Variable importance can easily be determined

If many features are correlated, a single representative feature can be used

Page 19: Machine learning with R

Large black boxperforming

magic

SUPERVISED LEARNINGARTIFICIAL NEURAL NETWORKS (ANN)

Input Output

Page 20: Machine learning with R

SUPERVISED LEARNINGARTIFICIAL NEURAL NETWORKS (ANN)

Input Output

Inputnodes

Outputnodes

Hiddennodes

Page 21: Machine learning with R

ARTIFICIAL NEURAL NETWORKS (ANN)EXAMPLE BACKPROPAGATION

• Backpropagation1. Nodes have connections and connections have a random assigned weight2. Provide input and let the network generate output3. Compare generated output with desired output4. Go from output nodes back to input and adjust the weight of the node connections.

Adjusting a little bit at a time increases learning time and accuracy5. Repeat from step 2 until desired error rate reached

• Can be done with weights or with node activation thresholds

Page 22: Machine learning with R

ARTIFICIAL NEURAL NETWORKS (ANN)SOME PERSONAL THOUGHTS (AS NEUROBIOLOGIST)

• Most samples of artificial neural networks do not take into account several properties of biological neural networks• Signals take time to go from A to B• Neurons are not arranged in layers

Biological neural networks have a 3d structure with specialized area’s• Once trained, most artificial neural networks are static and don’t learn anymore• Biological neural networks implement a wide range of signaling mechanisms per node

(neurotransmitters)

• Learning algorithms are not only internal to the neural network. Natural selection also plays a role

Page 23: Machine learning with R

SUPERVISED LEARNINGCHALLENGES

• Requires learning set of inputs and desired outputs

• Training data should be balanced• Correlated features cause biases• Outputs should be distributed as evenly as possible

Page 24: Machine learning with R

SUPERVISED LEARNING

AAAAAA AB B

Training data

ABBBBBB

Test data ABAAAAAA

Input Output

Input Output

Page 25: Machine learning with R

UNSUPERVISED LEARNING

• Unsupervised machine learning is the machine learning task of inferring a function to describe hidden structure from "unlabeled" data

a classification or categorization is not included in the observations

• Examples• Clustering• Anomaly detection• Neural networks (Self Organizing Map)

Page 26: Machine learning with R

HIERARCHICAL CLUSTERING

Every point starts a cluster

Clusters merge as they go up the tree

Page 27: Machine learning with R

HIERARCHICAL CLUSTERINGA: MEAN 2,2 STDEV 2 B: MEAN 6,6 STDEV 2

Page 28: Machine learning with R

HIERARCHICAL CLUSTERING (HCL)

Page 29: Machine learning with R

HIERARCHICAL CLUSTERINGA: MEAN 2,2 STDEV 2 B: MEAN 6,6 STDEV 2

Original Prediction

Page 30: Machine learning with R

HIERARCHICAL CLUSTERINGA: MEAN 2,2 STDEV 1 B: MEAN 6,6 STDEV 1

Original Prediction

Page 31: Machine learning with R

1 2 3

History Installation Basics

INTRODUCING R

Page 32: Machine learning with R

R A SHORT HISTORY

• Conceived august 1993An implementation of the S programming languageS was conceived in 1976

• Open sourced June 1995

• Main competitors: SPSS and SAS

• A lot of (mostly statistical) libraries availableCRAN package repository features 10366 available packages.

Page 33: Machine learning with R

R INSTALLATION

• Download and install Rhttps://www.r-project.org/

Page 34: Machine learning with R

R STUDIO INSTALLATION

• Download and install R Studiohttps://www.rstudio.com/

Page 35: Machine learning with R

R BASICS

• R is a functional programming (FP) language

• It provides many tools for the creation and manipulation of functions.

• You can do anything with functions that you can do with vectors: you can assign them to variables, store them in lists, pass them as arguments to other functions, create them inside functions, and even return them as the result of a function.

Page 36: Machine learning with R

R BASICSSOME FEATURES

• GIT integration

• Interpreted; does not require compilationExecute a line in your script and look at the result in the console

• Has its own markdown variant for documentationEspecially useful if you want to have graphs

• R Shiny allows you to generate and host scripts / graphs and make them available from a browser

Page 37: Machine learning with R

R BASICSSOME FEATURES

• Code completion

• Allows multi threaded execution

• Can be run remotely on an R-server

• Great at reading / writing datasetsFor example web site scraping for data

• Of course great at statistics

• Great at generating plotsEspecially when using the ggplot2 library

Page 38: Machine learning with R

R BASICSSOME TIPS TO GET STARTED

• ?ggplot• help(package=“ggplot2")

Page 39: Machine learning with R

R DATATYPESTHE VECTOR

• Vectora <- c(1,2,5.3,6,-2,4) # numeric vectorb <- c("one","two","three") # character vectorc <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #logical vector

a <- c(1,2,5.3,6,-2,4)b <- a * 2

[1] 2.0 4.0 10.6 12.0 -4.0 8.0

Page 40: Machine learning with R

R DATATYPESTHE MATRIX. ALL VALUES HAVE THE SAME TYPE AND LENGTH

# generates 5 x 4 numeric matrix y<-matrix(1:20, nrow=5,ncol=4)

# another examplecells <- c(1,26,24,68)rnames <- c("R1", "R2")cnames <- c("C1", "C2") mymatrix <- matrix(cells, nrow=2, ncol=2, byrow=TRUE, dimnames=list(rnames, cnames))

# accessing matrix values|x[,4] # 4th column of matrixx[3,] # 3rd row of matrix x[2:4,1:3] # rows 2,3,4 of columns 1,2,3

Page 41: Machine learning with R

R DATATYPESTHE DATA.FRAME. LIKE A MATRIX BUT TYPES AND LENGTHS CAN VARY

d <- c(1,2,3,4)e <- c("red", "white", "red", NA)f <- c(TRUE,TRUE,TRUE,FALSE)mydata <- data.frame(d,e,f)names(mydata) <- c("ID","Color","Passed") # variable names

myframe[3:5] # columns 3,4,5 of data framemyframe[c("ID","Age")] # columns ID and Age from data framemyframe$X1 # variable x1 in the data frame

Page 42: Machine learning with R

R DATATYPESTHE LIST

• An ordered collection of objects (components)

# example of a list with 4 components – # a string, a numeric vector, a matrix, and a scaler w <- list(name=“Maarten", mynumbers=a, mymatrix=y, age=36)

# example of a list containing two lists v <- c(list1,list2)

Page 43: Machine learning with R

1 2 3

Hosting plotsShinyPlot.ly

R markdown Web site crawling

COOL FEATURES OF R

Page 44: Machine learning with R

COOL FEATURES OF RSHINY

Page 45: Machine learning with R

COOL FEATURES OF RSHINY

UI Server

Page 46: Machine learning with R

COOL FEATURES OF RPLOT.LY INTERACTIVE GRAPHS

Page 47: Machine learning with R

COOL FEATURES OF RPLOT.LY INTERACTIVE GRAPHS

Page 48: Machine learning with R

COOL FEATURES OF RR MARKDOWN

Page 49: Machine learning with R

COOL FEATURES OF RR MARKDOWN

Page 50: Machine learning with R

COOL FEATURES OF RWEB SITE CRAWLING

Page 51: Machine learning with R

COOL FEATURES OF RWEB SITE CRAWLING

• Sector to Industry, Industry to Company

Page 52: Machine learning with R

COOL FEATURES OF RWEB SITE CRAWLING

Page 53: Machine learning with R

COOL FEATURES OF RWEB SITE CRAWLING

http://chart.finance.yahoo.com/table.csv?s=ABT.AX&a=1&b=28&c=2017&d=2&e=28&f=2017&g=d&ignore=.csv

Page 54: Machine learning with R

1 2 3

What does Oracle do with R

Using data from an Oracle DB in R

Using functions from R in the Oracle DB

ORACLE AND R

Page 55: Machine learning with R

ORACLE AND R

Page 56: Machine learning with R

ORACLE R ENTERPRISEUSING DATABASE DATA IN R

Page 57: Machine learning with R

ORACLE R ENTERPRISEUSING R SCRIPTS DIRECTLY IN SQL STATEMENTS

Page 58: Machine learning with R

https://github.com/MaartenSmeets/R