42
Data Science With R ~ for ~ Java Developers @Sander_Mak

Data Science with R for Java Developers

  • Upload
    nljug

  • View
    623

  • Download
    4

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Data Science with R for Java Developers

Data Science With R

~ for ~

Java De

velopers

@Sander_Mak

Page 2: Data Science with R for Java Developers

Agenda

Data Science

The R language

Gimme some Java!

1

1

1

1 1

1

11

0

0

0

0

0

0

Page 3: Data Science with R for Java Developers

90% of the world’s data wasproduced in the last 2 years

- SINTEF/ScienceDaily June 2013!!!!!!!!

We need more thanjust CRUD

Page 4: Data Science with R for Java Developers

Stand back.

I know Data Science!

Page 5: Data Science with R for Java Developers

HackingSkills

DomainExpertise

DataScience

MachineLearning

OperationsResearch

Danger!Perl ahead!

Math & Statistics

Page 6: Data Science with R for Java Developers

HackingSkills

DomainExpertise

DataScience

MachineLearning

OperationsResearch

Danger!Perl ahead!

Math & Statistics

Page 7: Data Science with R for Java Developers

Data Science:Achievement Unlocked

Page 8: Data Science with R for Java Developers

R, R-Studio

Today

Data Science:Achievement Unlocked

Page 9: Data Science with R for Java Developers

Agenda

Data Science

The R language

Gimme some Java!

1

1

1 1 1

1

1

10

0

0

0

0

0

Page 10: Data Science with R for Java Developers

LanguageDesigners? Statisticians?

Page 11: Data Science with R for Java Developers

LanguageDesigners? Statisticians?

The best thing about R is that it was developed by statisticians. The worst thing about R is that... it was developed by statisticians. - Bo Cowgill, Google

Page 12: Data Science with R for Java Developers

Why R, then?

Open Source

De-facto standard (in statistical research)

“It’s a DSL posing as general purpose language”

Interactive data exploration

Page 13: Data Science with R for Java Developers

Why not R, then?Slow

Memory Bound

(Did I mention it’s a quirky language?)

Try googling for R...

Page 14: Data Science with R for Java Developers

Why not R, then?

‘If you are using R and you think you’re in hell, this is a map for you.’

- The R Inferno

Slow

Memory Bound

(Did I mention it’s a quirky language?)

Try googling for R...

Page 15: Data Science with R for Java Developers

Apparently, statisticians aren’t designers, either...

Page 16: Data Science with R for Java Developers

VS

Page 17: Data Science with R for Java Developers

Dynamic (eval)

Interpreted

Static types

Compiled

Functional/OO/Procedural OO

Page 18: Data Science with R for Java Developers

Factor Enum

numeric

character String

Integer/Double/...

Page 19: Data Science with R for Java Developers

Factor Enum

numeric

character String

vectorlist

dataframe

Integer/Double/...

Page 20: Data Science with R for Java Developers

1-based 0-based12

34

01

23

Page 21: Data Science with R for Java Developers

1-based 0-based12

34

01

23

for-loops

higher-order functionssapply(vec, function(elm) { elm + 1;})

Page 22: Data Science with R for Java Developers

Studio

Page 23: Data Science with R for Java Developers

Central

ComprehensiveRArchiveNetwork

Studio

Page 24: Data Science with R for Java Developers

Coding time!

Page 27: Data Science with R for Java Developers

Titanic Competition: Machine Learning from Disaster

Sex == Female

Decision Tree

Age > 50Age > 16

Fare > 100

T FT T F

Page 28: Data Science with R for Java Developers

Titanic Competition: Machine Learning from Disaster

Sex == Female

Decision Tree

Age > 50Age > 16

Random Forest

Fare > 100

T FT T F

T

FT T FT

FT T F

T

FT T FT

FT T F

Page 29: Data Science with R for Java Developers

Demo time!

Page 30: Data Science with R for Java Developers

...

...

Page 31: Data Science with R for Java Developers

Agenda

Data Science

The R language

Gimme some Java!

1

1

1 1 1

1

1

1

0

0

0

0

0

0

Page 32: Data Science with R for Java Developers

Bridging R and Java

Integrate

Assimilate

Replace

Page 33: Data Science with R for Java Developers

rJava & Java/R interfaceIntegrate

Two way native interface - JNI: libjri - or TCP to RServe

Rengine re = new Rengine(new String[] {}, false, null);

// wait until engine is readyif (!re.waitForR()) { throw new IllegalStateException(“Can’t load R engine”);}

re.eval("data(cars)", false);REXP cars = re.eval("cars");

RVector carsVector = cars.asVector();// dissect carsVector...

Page 34: Data Science with R for Java Developers

Assimilate

Reimplementation of R on JVM

Fast & lean

Parallelized

Just-another-lib

... not production ready yet...

Page 35: Data Science with R for Java Developers

Assimilate

// create a script engine managerScriptEngineManager factory = new ScriptEngineManager();

// create an R engineScriptEngine engine = factory.getEngineByName("Renjin");

// load package from classpathengine.eval(“library(survey)");

// evaluate R code from Stringengine.eval("print('Hello from R')");

Reimplementation of R on JVM

Fast & lean

Parallelized

Just-another-lib

... not production ready yet...

Page 36: Data Science with R for Java Developers

Big Data?

Page 37: Data Science with R for Java Developers

ReplaceJVM Libraries/platforms

Page 38: Data Science with R for Java Developers

ReplaceScalable R distributions(non-JVM)

Revolution Analytics

Oracle Enterprise R

Page 39: Data Science with R for Java Developers

Wrap-up

Data Science

The R language

Gimme some Java!

1

1

1 1 1

1

1

10

0

0

0

0

0

Page 40: Data Science with R for Java Developers

SanitizeExplore

Model PredictScale

Page 41: Data Science with R for Java Developers

Next steps

Computing for Data Analysisstarts Jan. 6th 2014

Install R Read

Page 42: Data Science with R for Java Developers

Questions?Data Science

The R language

Gimme some Java! 11

1 1 11 110

0

0

0

0

0

@Sander_Mak

branchandbound.net