If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp...

Preview:

Citation preview

If You Can’t Beat ’Em …

Some Comments on Using R Along With Other Languages

Dirk Eddelbuettel

31 July 2016

Open Source Statistical Software for Data Science — Invited PapersJoint Statistical Meetings (JSM), Chicago, IL

JSM 2016 1/27

Welcome to Lollapalooza!

JSM 2016 2/27

Overview

Content

• Single- or Multi-Language ?

• Interlude

• Empirics

• Illustration

• Conclusion

JSM 2016 3/27

Single- Or Multi-Language ?

JSM 2016 4/27

Claim: 1 + 1 > 2

Better with more than one?

• No one language fits all

• Real-world projects are frequently multi-language

• See e.g. job ads which rarely ever list just one language

JSM 2016 5/27

Counter-claim: 1 + 1 < 2

Or better with just one?

• Mental switching cost between languages? Possibly

• Interop difficult and less portable? Maybe, but that is anargument against weak systems / OSs

• Easier / less to learn?

• “More hoops” to code?

JSM 2016 6/27

Mental switching costs?

JSM 2016 7/27

So which one is it?

Open Question

• Hard to measure or test: Any empirics on real world projects?

• Code competition / comparisons (e.g. Project Euler): Are theyrealistic?

JSM 2016 8/27

Interlude

JSM 2016 9/27

John Chambers

Chambers (2008) Software ForData AnalysisChapters 10 and 11 devoted toInterfaces I: C and Fortran andInterfaces II: Other Systems.

JSM 2016 10/27

John Chambers

Chambers (2016) Extending RAn entire book about this withconcrete Python, Julia and C++code and examples

JSM 2016 11/27

John Chambers

Chambers 2016, Chapter 4

The fundamental lesson about programming in the large isthat requires a correspondingly broad and flexibleresponse. In particular, no single language or softwaresystem os likely to be ideal for all aspects. Interfacingmultiple systems is the essence. Part IV explores thedesign of of interfaces from R.

JSM 2016 12/27

Chamber 1976

Thanks to John Chambers for ascanned copy of this historic sketch.

JSM 2016 13/27

Empirics

JSM 2016 14/27

Growth

2010 2012 2014 2016

010

020

030

040

050

060

070

0

Growth of Rcpp usage on CRAN

n

Number of CRAN packages using Rcpp (left axis)Percentage of CRAN packages using Rcpp (right axis)

010

020

030

040

050

060

070

0

2010 2012 2014 2016

02

46

8

JSM 2016 15/27

Pagerank

library(pagerank) # github.com/andrie/pagerank

cran <- ”http://cloud.r-project.org”

pr <- compute_pagerank(cran)round(100*pr[1:5], 3)

## Rcpp MASS ggplot2 Matrix mvtnorm## 2.452 1.771 1.088 0.920 0.749

JSM 2016 16/27

Pagerank

bootrglzoodata.tablenlmeRCurlcodaXMLforeachreshape2jsonliteRcppArmadillodplyrigraphspstringrhttrlatticeplyrsurvivalmvtnormMatrixggplot2MASSRcpp

0.005 0.010 0.015 0.020 0.025

Top 25 of Page Rank as of July 2016

JSM 2016 17/27

Pagerank

Rcpp

MASS

ggplot2

Matrix

mvtnorm

lattice

plyr

stringr

RColorBrewer

dichromat

munsell

labeling

stringi

magrittr

colorspace

digest

gtable

reshape2

scales

sp

MatrixModels

graph

SparseM

sfsmisc

→→→

ImportsLinkingToEnhances

Top 5 packages by page rank

JSM 2016 18/27

Illustration

JSM 2016 19/27

Using R to C++ to Boost to Python, and back

Setup

py_cflags <- system(”python2.7-config --cflags”, intern=TRUE)se <- Sys.setenv; ge <- Sys.getenv # shorthands to typesetse(”PKG_CFLAGS”=sprintf(”%s %s”, ge(”PKG_CFLAGS”), py_cflags))se(”PKG_CXXFLAGS”=sprintf(”%s %s”, ge(”PKG_CXXFLAGS”), py_cflags))py_ldflags <- system(”python2.7-config --ldflags”, intern=TRUE)se(”PKG_LIBS”=sprintf(”%s %s %s”, ge(”PKG_CFLAGS”),

”-lboost_python-py27”, py_ldflags))

JSM 2016 20/27

Using R to C++ to Boost to Python, and back

#include <Rcpp.h>#include <Python.h>

// [[Rcpp::export]]void initialize_python() {

Py_SetProgramName(””); /* optional but recommended */Py_Initialize();

}

// [[Rcpp::export]]void hello_python() {

PyRun_SimpleString(”from time import time,ctime\n””print ’Today is’,ctime(time())\n”);

}

JSM 2016 21/27

Using R to C++ to Boost to Python, and back

Hello, World: Called from R

initialize_python()hello_python()

## Today is Sat Jul 30 13:38:01 2016

More at http://gallery.rcpp.org/articles/rcpp-python/Disclaimer: For illustration purposes. Works as designed on Ubuntu. Not meant to be universally portable to all three OSs.

JSM 2016 22/27

Conclusion

JSM 2016 23/27

Being Polyglot

Mixing Languages

• Common

• Natural

• Unavoidable

JSM 2016 24/27

Being Polyglot

Consequences

• Must make it easier to interoperate

• Stop bickering among ourselves

• Build systems that are larger that the sum of their parts

JSM 2016 25/27

Being Polyglot

Just Do It

JSM 2016 26/27

Appendix

Lars Wirzenius “Which license is the most free?”Free software licences can be roughly grouped into permissive and copyleft ones.[…] A permissive licence lets you do things that a copyleft one forbids, so clearly thepermissive licence is more free. A copyleft licence means software using it won’tever become non-free against the wills of the copyright holders, so clearly acopyleft licence is more free than a permissive one.

Both sides are both right and wrong, of course, which is why this argument willcontinue forever. […]

If a discussion about the relative freedom of licence types becomes heated, stepaway. It’s not worth participating anymore.

http://yakking.branchable.com/posts/comparative-freeness/

JSM 2016 27/27

Recommended