Using R to win Kaggle Data Mining Competitions Chris Raimondi November 1, 2012
Preview:
Citation preview
- Slide 1
- Using R to win Kaggle Data Mining Competitions Chris Raimondi
November 1, 2012
- Slide 2
- Overview of talk What I hope you get out of this talk Life
before R Simple model example R programming language
Background/Stats/Info How to get started Kaggle
- Slide 3
- Overview of talk Individual Kaggle competitions HIV Progression
Chess Mapping Dark Matter Dunnhumbys Shoppers Challenge Online
Product Sales
- Slide 4
- What I want you to leave with Belief that you dont need to be a
statistician to use R - NOR do you need to fully understand Machine
Learning in order to use it Motivation to use Kaggle competitions
to learn R Knowledge on how to start
- Slide 5
- My life before R Lots of Excel Had tried programming in the
past got frustrated Read NY Times article in January 2009 about R
& Google Installed R, but gave up after a couple minutes Months
later
- Slide 6
- My life before R Using Excel to run PageRank calculations that
took hours and was very messy Was experimenting with Pajek a
windows based Network/Link analysis program Was looking for a
similar program that did PageRank calculations Revisited R as a
possibility
- Slide 7
- My life before R Came across R Graph Gallery Saw this
graph
- Slide 8
- Slide 9
- Addicted to R in one line of code pairs(iris[1:4], main="Edgar
Anderson's Iris Data", pch=21, bg=c("red", "green3",
"blue")[unclass(iris$Species)]) pairs = function iris =
dataframe
- Slide 10
- What do we want to do with R? Machine learning a.k.a. or more
specifically Making models We want to TRAIN a set of data with
KNOWN answers/outcomes In order to PREDICT the answer/outcome to
similar data where the answer is not known
- Slide 11
- Slide 12
- How to train a model R allows for the training of models using
probably over 100 different machine learning methods To train a
model you need to provide 1.Name of the function which machine
learning method 2.Name of Dataset 3.What is your response variable
and what features are you going to use
- Slide 13
- Example machine learning methods available in R BaggingPartial
Least Squares Boosted TreesPrincipal Component Regression Elastic
NetProjection Pursuit Regression Gaussian ProcessesQuadratic
Discriminant Analysis Generalized additive modelRandom Forests
Generalized linear modelRecursive Partitioning K Nearest
NeighborRule-Based Models Linear RegressionSelf-Organizing Maps
Nearest Shrunken CentroidsSparse Linear Discriminant Analysis
Neural NetworksSupport Vector Machines
- Slide 14
- Code used to train decision tree library(party) irisct