Upload
nljug
View
623
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Data Science With R
~ for ~
Java De
velopers
@Sander_Mak
Agenda
Data Science
The R language
Gimme some Java!
1
1
1
1 1
1
11
0
0
0
0
0
0
90% of the world’s data wasproduced in the last 2 years
- SINTEF/ScienceDaily June 2013!!!!!!!!
We need more thanjust CRUD
Stand back.
I know Data Science!
HackingSkills
DomainExpertise
DataScience
MachineLearning
OperationsResearch
Danger!Perl ahead!
Math & Statistics
HackingSkills
DomainExpertise
DataScience
MachineLearning
OperationsResearch
Danger!Perl ahead!
Math & Statistics
Data Science:Achievement Unlocked
R, R-Studio
Today
Data Science:Achievement Unlocked
Agenda
Data Science
The R language
Gimme some Java!
1
1
1 1 1
1
1
10
0
0
0
0
0
LanguageDesigners? Statisticians?
LanguageDesigners? Statisticians?
The best thing about R is that it was developed by statisticians. The worst thing about R is that... it was developed by statisticians. - Bo Cowgill, Google
Why R, then?
Open Source
De-facto standard (in statistical research)
“It’s a DSL posing as general purpose language”
Interactive data exploration
Why not R, then?Slow
Memory Bound
(Did I mention it’s a quirky language?)
Try googling for R...
Why not R, then?
‘If you are using R and you think you’re in hell, this is a map for you.’
- The R Inferno
Slow
Memory Bound
(Did I mention it’s a quirky language?)
Try googling for R...
Apparently, statisticians aren’t designers, either...
VS
Dynamic (eval)
Interpreted
Static types
Compiled
Functional/OO/Procedural OO
Factor Enum
numeric
character String
Integer/Double/...
Factor Enum
numeric
character String
vectorlist
dataframe
Integer/Double/...
1-based 0-based12
34
01
23
1-based 0-based12
34
01
23
for-loops
higher-order functionssapply(vec, function(elm) { elm + 1;})
Studio
Central
ComprehensiveRArchiveNetwork
Studio
Coding time!
Titanic Competition: Machine Learning from Disaster
Titanic Competition: Machine Learning from Disaster
Survived?
Titanic Competition: Machine Learning from Disaster
Sex == Female
Decision Tree
Age > 50Age > 16
Fare > 100
T FT T F
Titanic Competition: Machine Learning from Disaster
Sex == Female
Decision Tree
Age > 50Age > 16
Random Forest
Fare > 100
T FT T F
T
FT T FT
FT T F
T
FT T FT
FT T F
Demo time!
...
...
Agenda
Data Science
The R language
Gimme some Java!
1
1
1 1 1
1
1
1
0
0
0
0
0
0
Bridging R and Java
Integrate
Assimilate
Replace
rJava & Java/R interfaceIntegrate
Two way native interface - JNI: libjri - or TCP to RServe
Rengine re = new Rengine(new String[] {}, false, null);
// wait until engine is readyif (!re.waitForR()) { throw new IllegalStateException(“Can’t load R engine”);}
re.eval("data(cars)", false);REXP cars = re.eval("cars");
RVector carsVector = cars.asVector();// dissect carsVector...
Assimilate
Reimplementation of R on JVM
Fast & lean
Parallelized
Just-another-lib
... not production ready yet...
Assimilate
// create a script engine managerScriptEngineManager factory = new ScriptEngineManager();
// create an R engineScriptEngine engine = factory.getEngineByName("Renjin");
// load package from classpathengine.eval(“library(survey)");
// evaluate R code from Stringengine.eval("print('Hello from R')");
Reimplementation of R on JVM
Fast & lean
Parallelized
Just-another-lib
... not production ready yet...
Big Data?
ReplaceJVM Libraries/platforms
ReplaceScalable R distributions(non-JVM)
Revolution Analytics
Oracle Enterprise R
Wrap-up
Data Science
The R language
Gimme some Java!
1
1
1 1 1
1
1
10
0
0
0
0
0
SanitizeExplore
Model PredictScale
Next steps
Computing for Data Analysisstarts Jan. 6th 2014
Install R Read
Questions?Data Science
The R language
Gimme some Java! 11
1 1 11 110
0
0
0
0
0
@Sander_Mak
branchandbound.net