24
Renjin: The new R interpreter built on the JVM Alexander Bertram BeDataDriven

Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

Renjin: The new R interpreter built on the JVM

Alexander Bertram

BeDataDriven

Page 2: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

What?

Renjin is a new interpreter for the R language.

Core & Builtins

Written in Java

Base

Stats

Graphics

GNU R Language Packages

Page 3: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

Why?

Memory Easier Integration

Speed

Performance

Java Virtual Machine

GC 500k libs

JIT tools

Parallelism

Page 4: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

Sure, but why Renjin?

Packages

Forks

biglm

bigvis

scaleR

scaleR

+ High performance for specific applications

- Require rewriting existing code - Limited applicability

pqR

+ Marginal improvements for all code - Unable to address underlying limitations

of the GNU R interpreter

Page 5: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

What do I get, like, today?

Page 6: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

Flexible

> renjin –f myscript.R

Command-Line Interpreter

Embeddable Java Library

Web-based REPL

Page 7: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

Java Virtual Machine

Renjin Session 1 Renjin Session 2 Renjin Session 3

Vector

Web Request Web Request Web Request

Immutable Data

Structures

Multiple In-process sessions, Shared Data

Page 8: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

Memory Efficiency

# GNU R Renjin

x <- runif(1e8) # +721 MB + 721 MB

y <- x + 1 # +761 MB

comment(y) <- "important!" # +763 MB

- getAttributes() Vector Interface - length() - getElement(int index)

Page 9: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

packages.renjin.org

Proper Dependency Management

Pre-built Package

Repository

Automated Testing of

Renjin

Translation of C/Fortran to

JVM Bytecode

Page 10: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

Seamless Access to Java/Scala Classes

import(com.acme.Customer)

bob <- Customer$new(name='Bob', age=36)

carol <- Customer$new(name='Carole', age=41)

bob$name <- "Bob II"

cat(c("Name: ", bob$name, "; Age: ", bob$age))

Page 11: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

Simple to embed in larger systems

// create a script engine manager

ScriptEngineManager factory = new ScriptEngineManager();

// create an R engine

ScriptEngine engine = factory.getEngineByName("Renjin");

// load package from classpath

engine.eval(“library(survey)");

// evaluate R code from String

engine.eval("print('Hello, World')");

// evaluate R script on disk

engine.eval(new FileReader("myscript.R"));

// evaluate R script from classpath

engine.eval(new InputStreamReader(

getClass().getResourceAsStream("myScript.R")));

Page 12: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

Package Development in Java

@DataParallel @Deferrable public static String chartr( String oldChars, String newChars, @Recycle String x) { StringBuilder translation = new StringBuilder(x.length()); for(int i=0;i!=x.length();++i) { int codePoint = x.codePointAt(i); int charIndex = oldChars.indexOf(codePoint); if(charIndex == -1) { translation.appendCodePoint(codePoint); } else { translation.appendCodePoint( newChars.codePointAt(charIndex)); } } return translation.toString(); }

Page 13: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

Under the hood

Page 14: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

Specialized Execution Modes

“Slow” AST

Interpreter

Vector Pipeliner

Scalar Compiler

- Supports full dynamism of R - Compute on the language

- Acts like a query planner - Batches, auto-parallelizes vector

workflows

- Partially evaluates & compiles loop bodies, apply functions to JVM byte code

Page 15: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

Queuing up work for the Vector Pipeliner

x <- runif(1e6)

y <- sqrt(x + 1)

z <- mean(y) - mean(x)

attr(z, 'comments') <- 'still not computed'

print(length(z)) # prints "1"

# but doesn't

#evaluate the mean

print(z) # triggers computation

Page 16: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

x <- runif(1e6)

y <- sqrt(x + 1)

z <- mean(y) - mean(x)

Page 17: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

Real-world case study:

Distance Correlation in the Energy Package

Page 18: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

Distance correlation: robust measure of association. Zero if and only if variables are independent.

Page 19: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

dcor <- function (x, y, index = 1) {

x <- as.matrix(dist(x))

y <- as.matrix(dist(y))

n <- nrow(x)

m <- nrow(y)

dims <- c(n, ncol(x), ncol(y))

Akl <- function(x) {

d <- as.matrix(x)^index

m <- rowMeans(d)

M <- mean(d)

a <- sweep(d, 1, m)

b <- sweep(a, 2, m)

return(b + M)

}

A <- Akl(x)

B <- Akl(y)

dCov <- sqrt(mean(A * B))

dVarX <- sqrt(mean(A * A))

dVarY <- sqrt(mean(B * B))

V <- sqrt(dVarX * dVarY)

if (V > 0)

dCor <- dCov/V

else dCor <- 0

return(list(dCov = dCov, dCor = dCor, dVarX =

dVarX, dVarY = dVarY))

}

dist(x) Evaluates as a

view

Defer rowMeans(x)

until later

Need to evaluate

Page 20: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages
Page 21: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

0

20

40

60

80

100

120

140

160

180

200

1000 2000 5000 10000

Number of Observations

GNU R C Renjin

Run time of distance correlation of 10 pairs of variables

Page 22: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

Where do we go from here?

Page 23: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

Inspired by…

Page 24: Renjin: The new R interpreter built on the JVM · What? Renjin is a new interpreter for the R language. Core & Builtins Written in Java Base Stats Graphics GNU R Language Packages

Join us! Download

& Test

Contribute! Contract us

for Commercial

Support Sponsor

Development!

> Renjin.org