14
The Compatibility Challenge: Examining R and Developing TERR Michael Sannella ([email protected]) TIBCO Spotfire http://spotfire.tibco.com These slides contain all the content from the useR!2014 poster with the same name

The Compatibility Challenge:Examining R and Developing TERR

Embed Size (px)

DESCRIPTION

Slides from Michael Sannella, architect for TIBCO Enterprise Runtime for R (TERR), on the the Compatibility Challenge: Examining R and Developing TERR. Presented at useR 2014

Citation preview

Page 1: The Compatibility Challenge:Examining R and Developing TERR

The Compatibility Challenge:Examining R and Developing TERR

Michael Sannella ([email protected])

TIBCO Spotfire

http://spotfire.tibco.com

These slides contain all the content from theuseR!2014 poster with the same name

Page 2: The Compatibility Challenge:Examining R and Developing TERR

What is TERR?

• TERR: TIBCO® Enterprise Runtime for R

• A commercial R-compatible statistics engine.

• The latest in a family of scripting engines.• S, S-PLUS®, R, TERR

• Embedded in Spotfire®• Provides R scripting, powers analytic tools.

• Free Developer's Edition available.

• Commercially available for custom integration.

2

Page 3: The Compatibility Challenge:Examining R and Developing TERR

Why R Compatibility?

• We developed TERR as a completely independent engine, without looking at R sources or documentation.

• However, we want TERR to be able to load and run R packages from CRAN and other repositories, so TERR must be compatible with R.• Many CRAN packages work with TERR without change.• Some CRAN packages can be made to work with TERR if

we rebuild them, without changing any of their code.• Some (hopefully few) CRAN packages require code

changes to make them work under TERR.

3

Page 4: The Compatibility Challenge:Examining R and Developing TERR

How to Achieve R Compatibility?

• Our approach: Examine R’s behavior, write unit tests, and make TERR pass these tests.

• R compatibility is a moving target: As R and the CRAN packages change over time, we must update TERR to follow.

• R Wrinkles: While tracking down TERR incompatibility issues, we uncovered some interesting R behavior.

• In some cases, we decided not to make TERR completely compatible with R.

4

Page 5: The Compatibility Challenge:Examining R and Developing TERR

Developing Compatibility Using Key Packages• For some elements of R, a good way to develop

compatibility was to focus on a key package or application that exercises certain functionality.

• Key package: The CRAN Matrix package• Matrix fully exercised the S4 object system.• Getting the Matrix tests working gave us confidence that we

had a compatible implementation of S4.

• Key application: The RStudio GUI• RStudio exercised the R engine entries used to embed R

within an application.• RStudio exercised source-code access via the “srcref”

attribute on parsed expressions and functions.• RStudio exercised the trace, browser functions used for

breakpoints and single-stepping.

5

Page 6: The Compatibility Challenge:Examining R and Developing TERR

R Wrinkles: Character Encoding

• TERR/R Incompatibility: R uses the "unknown" character encoding for most strings, whereas TERR uses the UTF-8 encoding by default.

• R inconsistency: Character encoding of string constants.

R-3.0.1/Linux

> Encoding(c('abc\xC4xyz', 'abc\u00C4xyz'))

[1] "unknown" "UTF-8"

R-3.0.1/Windows (and TERR 2.6)

> Encoding(c('abc\xC4xyz', 'abc\u00C4xyz'))

[1] "latin1" "UTF-8"

6

Page 7: The Compatibility Challenge:Examining R and Developing TERR

R Wrinkles: Character Encoding (2)

• R inconsistency: The R/Windows parser doesn't accept many UTF-8 chars in names.

R 3.1.0 Linux (and TERR 2.6):

> typeof(parse(text="a\u30A4")[[1]])

[1] "symbol"

R 3.1.0 Windows:

> typeof(parse(text="a\u30A4")[[1]])

Error in parse(text = "a<U+30A4>") :

<text>:1:7: unexpected symbol

1: a<U+30A4

^

7

Page 8: The Compatibility Challenge:Examining R and Developing TERR

R Wrinkles: Non-Local Returns

• Initially, we had problems using TERR in RStudio, which executes functions with the following form:

function() {

tryCatch(return(someCalc()),

error=function(e) return("err"))}

• The first argument to tryCatch is evaluated in a promise, but it can include a “non-local” return that unwinds the stack to the function where “return” appears.

8

Page 9: The Compatibility Challenge:Examining R and Developing TERR

R Wrinkles: Non-Local Returns (2)

• Consider the following code:yy <- function(expr, n) {

cat("yy: n=", n, "\n")

if (n>1) {

yy(expr, n-1)

} else {

expr

}

"yyret"

}

zz <- function(n) {

yy(return(n), n)

"zzret"

}

• When the expression expr is finally evaluated in yy, it unwinds multiple calls to yy, and returns from the outer function zz:

> zz(3)

yy: n= 3

yy: n= 2

yy: n= 1

[1] 3

>

9

Page 10: The Compatibility Challenge:Examining R and Developing TERR

Compatibility Challenge: R’s C API

• Many CRAN packages contain C code calling into the R engine via C entries (the “Rapi” API).

• We implement Rapi entries in TERR as we find CRAN packages that need them.

• Problem: the USE_RINTERNALS macro• If USE_RINTERNALS is defined, some R macros directly access

R object structures, instead of calling Rapi entries.• This may improve performance for some code

(but this is not a panacea).• This won't work with TERR, unless we make the TERR object

structure identical to the R object structure.

• Solution: Can gain compatibility for some packages by rebuilding the package without USE_RINTERNALS defined.

10

Page 11: The Compatibility Challenge:Examining R and Developing TERR

Compatibility Challenge: Packages Including Base R Code• Symptom: Trying to run Matrix code, we saw an

unexpected call to .Internal, though it didn’t appear in the Matrix sources:

> checkMatrix(A)

Error in .Internal:

unimplemented .Internal function: drop

• Reason: S4 setGeneric can incorporate base function definitions into a generic default method.• Matrix defines methods for drop, which creates a generic

with base::drop as the default

• Solution: When TERR loads a package with an S4 default method from a system library (base, stats, graphics, utils), it substitutes the TERR function from that library.

11

Page 12: The Compatibility Challenge:Examining R and Developing TERR

Compatibility Challenge: The smoothEnds Problem• Symptom: In the IRanges package, the R system function

stats::smoothEnds works with an IRanges Rle object:

> x <- Rle((-4:4)^2)

> options(dropRle = TRUE)

> stats::smoothEnds(x)

numeric-Rle of length 9 with 9 runs

Lengths: 1 1 1 1 1 1 1 1 1

Values : 16 9 4 1 0 1 4 9 16

• However, this doesn't work in TERR:

> x <- Rle((-4:4)^2)

> options(dropRle = TRUE)

> stats::smoothEnds(x)

Error: 'y' must be numeric

12

Page 13: The Compatibility Challenge:Examining R and Developing TERR

Compatibility Challenge: The smoothEnds Problem (2)• Reason:

• TERR implements stats::smoothEnds in native C code.• Most likely, R implements it as R-language code, calling

other arithmetic operators with methods defined for Rleobjects.

• Solution:• Rewrite TERR’s version of stats::smoothEnds as R-

language code.

• Deeper issue:• A TERR algorithm has to call the same methods as R, if

they can be redefined for particular object classes.

13

Page 14: The Compatibility Challenge:Examining R and Developing TERR

For More Information

• Drop by the TIBCO booth.

• Attend our talks:• Louis Bajuk-Yorgan: “Deploying R into Business Intelligence and

Real-time Applications”• (Business track, Session 5, Wednesday 16:00)

• Stephen Kaluzny: “Software Testing and the R Language”• (Business track, Session 6, Thursday 10:00)

• Spotfire and TERR:• http://spotfire.tibco.com/terr

• TERR Developer’s Edition:• http://www.tibcommunity.com/community/products/analytics/terr• http://tap.tibco.com/• https://docs.tibco.com/products/tibco-enterprise-runtime-for-r

14