73
R and ‘Faster Data’ The Case for Rcpp Dirk Eddelbuettel ISM HPCCON 2015 & ISM HPC on R Workshop The Institute of Statistical Mathematics, Tokyo, Japan October 9 - 12, 2015 Released under a Creative Commons Attribution-ShareAlike 4.0 International License. 1/73

R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

R and lsquoFaster DatarsquoThe Case for Rcpp

Dirk EddelbuettelISM HPCCON 2015 amp ISM HPC on R WorkshopThe Institute of Statistical Mathematics Tokyo JapanOctober 9 - 12 2015Released under a Creative Commons Attribution-ShareAlike 40 International License

173

Introduction

273

A Very Kind Tweet

373

And Another Tweet

473

And Yet Another Tweet

573

And Why Not Another Tweet

673

And Last But Not Least

773

Extending R

873

Why R Programming with Data

ChambersComputationalMethods for DataAnalysis Wiley1977

Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988

Chambers andHastie StatisticalModels in SChapman amp Hall1992

ChambersProgramming withData Springer1998

ChambersSoftware for DataAnalysisProgramming withR Springer 2008

Thanks to John Chambers for sending me high-resolution scans of the covers of his books

973

A Simple Example

xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)

1073

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1173

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

1273

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1373

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1473

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1573

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1673

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1773

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1873

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 2: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Introduction

273

A Very Kind Tweet

373

And Another Tweet

473

And Yet Another Tweet

573

And Why Not Another Tweet

673

And Last But Not Least

773

Extending R

873

Why R Programming with Data

ChambersComputationalMethods for DataAnalysis Wiley1977

Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988

Chambers andHastie StatisticalModels in SChapman amp Hall1992

ChambersProgramming withData Springer1998

ChambersSoftware for DataAnalysisProgramming withR Springer 2008

Thanks to John Chambers for sending me high-resolution scans of the covers of his books

973

A Simple Example

xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)

1073

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1173

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

1273

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1373

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1473

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1573

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1673

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1773

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1873

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 3: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

A Very Kind Tweet

373

And Another Tweet

473

And Yet Another Tweet

573

And Why Not Another Tweet

673

And Last But Not Least

773

Extending R

873

Why R Programming with Data

ChambersComputationalMethods for DataAnalysis Wiley1977

Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988

Chambers andHastie StatisticalModels in SChapman amp Hall1992

ChambersProgramming withData Springer1998

ChambersSoftware for DataAnalysisProgramming withR Springer 2008

Thanks to John Chambers for sending me high-resolution scans of the covers of his books

973

A Simple Example

xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)

1073

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1173

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

1273

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1373

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1473

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1573

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1673

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1773

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1873

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 4: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

And Another Tweet

473

And Yet Another Tweet

573

And Why Not Another Tweet

673

And Last But Not Least

773

Extending R

873

Why R Programming with Data

ChambersComputationalMethods for DataAnalysis Wiley1977

Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988

Chambers andHastie StatisticalModels in SChapman amp Hall1992

ChambersProgramming withData Springer1998

ChambersSoftware for DataAnalysisProgramming withR Springer 2008

Thanks to John Chambers for sending me high-resolution scans of the covers of his books

973

A Simple Example

xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)

1073

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1173

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

1273

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1373

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1473

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1573

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1673

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1773

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1873

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 5: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

And Yet Another Tweet

573

And Why Not Another Tweet

673

And Last But Not Least

773

Extending R

873

Why R Programming with Data

ChambersComputationalMethods for DataAnalysis Wiley1977

Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988

Chambers andHastie StatisticalModels in SChapman amp Hall1992

ChambersProgramming withData Springer1998

ChambersSoftware for DataAnalysisProgramming withR Springer 2008

Thanks to John Chambers for sending me high-resolution scans of the covers of his books

973

A Simple Example

xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)

1073

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1173

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

1273

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1373

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1473

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1573

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1673

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1773

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1873

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 6: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

And Why Not Another Tweet

673

And Last But Not Least

773

Extending R

873

Why R Programming with Data

ChambersComputationalMethods for DataAnalysis Wiley1977

Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988

Chambers andHastie StatisticalModels in SChapman amp Hall1992

ChambersProgramming withData Springer1998

ChambersSoftware for DataAnalysisProgramming withR Springer 2008

Thanks to John Chambers for sending me high-resolution scans of the covers of his books

973

A Simple Example

xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)

1073

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1173

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

1273

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1373

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1473

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1573

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1673

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1773

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1873

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 7: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

And Last But Not Least

773

Extending R

873

Why R Programming with Data

ChambersComputationalMethods for DataAnalysis Wiley1977

Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988

Chambers andHastie StatisticalModels in SChapman amp Hall1992

ChambersProgramming withData Springer1998

ChambersSoftware for DataAnalysisProgramming withR Springer 2008

Thanks to John Chambers for sending me high-resolution scans of the covers of his books

973

A Simple Example

xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)

1073

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1173

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

1273

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1373

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1473

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1573

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1673

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1773

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1873

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 8: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Extending R

873

Why R Programming with Data

ChambersComputationalMethods for DataAnalysis Wiley1977

Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988

Chambers andHastie StatisticalModels in SChapman amp Hall1992

ChambersProgramming withData Springer1998

ChambersSoftware for DataAnalysisProgramming withR Springer 2008

Thanks to John Chambers for sending me high-resolution scans of the covers of his books

973

A Simple Example

xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)

1073

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1173

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

1273

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1373

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1473

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1573

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1673

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1773

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1873

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 9: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Why R Programming with Data

ChambersComputationalMethods for DataAnalysis Wiley1977

Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988

Chambers andHastie StatisticalModels in SChapman amp Hall1992

ChambersProgramming withData Springer1998

ChambersSoftware for DataAnalysisProgramming withR Springer 2008

Thanks to John Chambers for sending me high-resolution scans of the covers of his books

973

A Simple Example

xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)

1073

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1173

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

1273

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1373

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1473

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1573

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1673

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1773

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1873

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 10: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

A Simple Example

xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)

1073

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1173

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

1273

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1373

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1473

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1573

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1673

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1773

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1873

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 11: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1173

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

1273

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1373

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1473

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1573

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1673

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1773

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1873

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 12: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

1273

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1373

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1473

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1573

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1673

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1773

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1873

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 13: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1373

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1473

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1573

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1673

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1773

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1873

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 14: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1473

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1573

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1673

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1773

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1873

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 15: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1573

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1673

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1773

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1873

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 16: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1673

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1773

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1873

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 17: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason

1773

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1873

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 18: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1873

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 19: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focusfor us in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)

1973

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 20: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

And Why C++

middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independentwidely-used and still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

2073

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 21: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions

middot C++11 (and C++14 and beyond) add enough to becalled a fifth language

NB Meyers original list of four languages appeared years before C++11 2173

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 22: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the

machine level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly

shielded

2273

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 23: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Interface Vision

2373

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 24: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Bell Labs May 1976

2473

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 25: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in

CC++Fortranmiddot Extremely high performance (in both serial and parallel

modes)middot Interpreted code with

middot An accessible high-level language made for Programmingwith Data

middot An interactive workflow for data analysismiddot Support for rapid prototyping research and

experimentation2573

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 26: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using RcppSugar

middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip

middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip

middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules

2673

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 27: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Getting Started

2773

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 28: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

sourceCpp Jumping Right In

RStudio makes starting very easy

2873

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 29: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

A First Example Contrsquoed

The following file gets createdinclude ltRcpphgtusing namespace Rcpp

This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)

[[Rcppexport]]NumericVector timesTwo(NumericVector x)

return x 2

You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation

RtimesTwo(42)

2973

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 30: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

A First Example Contrsquoed

So what just happened

middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name

3073

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 31: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

A First Example Contrsquoed

Try it

middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory

or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it

middot Then at the R prompt

simpletimesTwo(21) more interestingtimesTwo(c(12344101))

3173

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 32: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

cppFunction

cppFunction() creates compiles and links a C++ file andcreates an R function to access it

cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function

3273

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 33: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

evalCpp

evalCpp() evaluates a single C++ expression Includes anddependencies can be declared

This allows us to quickly check C++ constructs

library(Rcpp)evalCpp(2 + 2) simple test

[1] 4

evalCpp(stdnumeric_limitsltdoublegtmax())

[1] 1797693e+308

3373

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 34: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Speed

3473

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 35: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Speed Example (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3573

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 36: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Speed Example in R

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3673

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 37: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Speed Example Timed

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348

3773

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 38: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Speed Example in C C++

A C or C++ solution can be equally simple

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

But how do we call it from R

3873

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 39: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 40: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt

int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)

[[Rcppexport]]SEXP fibonacci_c(SEXP n)

SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result

Rsapply(010 fibonacci_c)

4073

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 41: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Speed Example in C C++

But Rcpp makes this much easier

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

4173

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 42: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Speed Example Comparing R and C++

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]

test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311

A nice gain of a few orders of magnitude4273

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 43: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

4373

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 44: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Speed Example Footnote Also Due to Matt

include ltRcpphgt

[[Rcppplugins(cpp11)]]

constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +

fibonacci_recursive_constexpr(n - 2))

[[Rcppexport]]int constexprFib()

const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result

4473

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 45: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Popularity

4573

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 46: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Used by 483 CRAN Packages as of this week

4673

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 47: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Page Rank One (According to Andrie de Vries)

4773

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 48: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Case Study

4873

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 49: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

Previous Status

middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots

4973

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 50: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

Easy R Fix

middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)

5073

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 51: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

What does Redis do

middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value

middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings

5173

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 52: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

What is wrong with that

middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points

middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using

RApiSerialize

5273

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 53: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

Getting Data

library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)

5373

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 54: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard Monthly Plot

Apr1982

Jan1984

Jan1986

Jan1988

Jan1990

Jan1992

Jan1994

Jan1996

Jan1998

Jan2000

Jan2002

Jan2004

Jan2006

Jan2008

Jan2010

Jan2012

Jan2014

Oct2015

SP Apr 1982 Oct 2015

200

400

600

800

1000

1200

1400

1600

1800

2000

200

400

600

800

1000

1200

1400

1600

1800

2000

5473

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 55: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

Setter Version 1 via rredis

insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))

dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)

invisible(NULL)

5573

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 56: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

Getter Base Version via rredis

getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5673

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 57: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

Getter Rcpp Version 1

getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)

z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]

x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx

5773

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 58: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

Getter Rcpp Version 2

getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

5873

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 59: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

Timings

key lt- quandlcmesp1res lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]

print(res)

test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801

5973

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 60: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

Can we do better

middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression

6073

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 61: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

New Rcpp Function R Side

insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))

redis$listRPush(key dat[i])invisible(NULL)

6173

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 62: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

New Rcpp Function Setter

redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)

uses binary protocol see hiredis docsredisReply reply =

static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))

stdstring res = freeReplyObject(reply)return(res)

6273

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 63: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

New Rcpp Function Getter

redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)

redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d

keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)

checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v

freeReplyObject(reply)return(x)

6373

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 64: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

Use This Way

getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx

6473

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 65: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

Timings

key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)

getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]

print(res2)

test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992

6573

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 66: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

Status

middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times

fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API

6673

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 67: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

Simple MessagePack buffer creation then sendingMessagePack buffer as binary load

typedef msgpacktypetupleltdouble int int intgt msg_t

msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it

replynew =static_castltredisReplygt(redisCommand(d RPUSH s b

keyc_str()bufferdata() buffersize()))

freeReplyObject(replynew)

6773

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 68: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Time Series Dashboard

Conclusion

middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for

accelerated modeling

6873

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 69: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

The End

6973

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 70: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Documentation

middot The Rcpp package comes with nine pdf vignettes andnumerous help pages

middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)

middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume

middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features

7073

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 71: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Rcpp Gallery

7173

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 72: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

The Rcpp book

7273

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End
Page 73: R and ‘Faster Data’ - Dirk Eddelbuetteldirk.eddelbuettel.com/papers/ism-hpccon-rcpp-fast-date.pdf · R and ‘Faster Data’ TheCaseforRcpp DirkEddelbuettel ISMHPCCON2015&ISMHPConRWorkshop

Very Last Words

Thank Youdirkeddelbuettelcom

7373

  • Introduction
  • Extending R
  • Interface Vision
  • Getting Started
  • Speed
  • Popularity
  • Case Study
  • The End