Interfacing C++ code from Rpeople.math.aau.dk/~sorenh/teaching/2012-ASC/day5/... · 1 Calling C++...

Preview:

Citation preview

Interfacing C++ code from R

Søren Højsgaard

Department of Mathematical Sciences

Aalborg University, Denmark

November 20, 2012

Printed: November 20, 2012 File: interfaceCpp-slides.tex

2

Contents

1 Calling C++ from R 3

2 Some important libraries and packages 4

3 Example: The exponential function 63.1 Using Rcpp together with inline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 Example: Calculating Fibonacci recursively 12

5 Example: Matrix multiplication 155.1 Using Rcpp together with inline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.2 Using RcppArmadillo together with inline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.3 Benchmarking – III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

6 Compiling without using inline 216.1 Example: Matrix multiplication using Rcpp . . . . . . . . . . . . . . . . . . . . . . . . . . . 256.2 Example: Matrix multiplication using RcppArmadillo . . . . . . . . . . . . . . . . . . . . . . 276.3 Example: Inverting a symmetric positive definite matrix using RcppArmadillo . . . . . . . . . . 28

7 Further examples on using RcppArmadillo with inline 307.1 Example: Extracting submatrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

8 Calling R from C++ 33

9 Building packages using Rcpp libraries 36

10 EXERCISES 37

3

1 Calling C++ from R

C++ code can be called from R.

• Easily done using the Rcpp package. (There are other ways not to be

discussed here).

• The Rcpp package combines nicely with the inline package.

• There are several existing libraries available with a C++ interface that we may

use – instead of “reinventing the wheel” (as we did when creating our own

matrix multiplication functions).

4

2 Some important libraries and packages

• Armadillo is an open-source C++ linear algebra library (matrix maths) aiming

towards a good balance between speed and ease of use.

http://arma.sourceforge.net/

R interface via RcppArmadillo

• Eigen is a C++ template library for linear algebra: matrices, vectors, numerical

solvers, and related algorithms.

http://eigen.tuxfamily.org/

R interface via RcppEigen

• The GNU Scientific Library (GSL) is a numerical library for C and C++

programmers.

http://www.gnu.org/software/gsl/

R interface via RcppGSL

5

Notice:

• To use these packages we must program in C++ rather than in C.

• Dirk Eddelbuettel provides many many examples on his website:

http://dirk.eddelbuettel.com/code/rcpp.html

6

3 Example: The exponential function

Recall our implementation of the exponential function in R:

> expfunR <- function(x, eps=.Machine$double.eps){

+ if (x<0) {

+ 1/expfunR(-x)

+ } else {

+ n <- 0; term <- 1; ans <- 1

+ while(abs(term)> eps) {

+ n = n + 1

+ term = term * x / n

+ ans <- ans + term

+ }

+ ans

+ }

+ }

7

A pure C implementation (which ignores the numerical difficulties when x < 0) is:

1 #include <math.h>

2 double C_expfun2 (double x){

3 double ans=1.0, term =1.0, eps=1e-16;

4 int n=0;

5 while (fabs(term)>eps){

6 n++;

7 term = term * x / n;

8 ans = ans + term;

9 }

10 return(ans);

11 }

8

3.1 Using Rcpp together with inline

> src <- '+ double x = as<double>(x_);

+ double ans=1.0, term=1.0, eps=1e-16;

+ int n=0;

+ while (fabs(term)>eps){

+ n++;

+ term = term * x / n;

+ ans = ans + term;

+ }

+ return(wrap(ans));'

Compiling, linking and loading is done with one function call

> library(inline)

> expfunC <- cxxfunction(signature(x_="numeric"), plugin="Rcpp", body=src)

• R data object (SEXPs) converted to C++ objects with as()

• C++ data object converted R data objects (SEXPs) with wrap()

9

An alternative implementation is to use the original function (accounting for

nummerical difficulties when x < 0 has been resolved):

> incltxt <- 'double C_expfun2 (double x){

+ double ans=1.0, term=1.0, eps=1e-16;

+ int n=0;

+ while (fabs(term)>eps){

+ n++;

+ term = term * x / n;

+ ans = ans + term;

+ }

+ return(ans);

+ }'

> expfunC2 <- cxxfunction(signature(x_="numeric"), includes=incltxt,

+ plugin="Rcpp", body='+ double x = as<double>(x_);

+ if (x>0){

+ return wrap(C_expfun2(x));

+ } else {

+ return wrap(1/C_expfun2(-x));

+ }

+ ')

10

> xx <- 1

> c(expfunR(xx), expfunC(xx), expfunC2(xx))

[1] 2.718282 2.718282 2.718282

> xx <- 20

> c(expfunR(xx), expfunC(xx), expfunC2(xx))

[1] 485165195 485165195 485165195

> xx <- -20

> c(expfunR(xx), expfunC(xx), expfunC2(xx))

[1] 2.061154e-09 5.621884e-09 2.061154e-09

> (expfunC(-20)-exp(-20))/exp(-20)

[1] 1.727543

> (expfunC2(-20)-exp(-20))/exp(-20)

[1] 2.006596e-16

>

11

> library(rbenchmark)

> cols <- c("test", "replications", "elapsed", "relative")

> N <- 20000

> benchmark(expfunR(10), expfunC(10), expfunC2(10),

+ columns=cols, order="relative", replications=N)

test replications elapsed relative

2 expfunC(10) 20000 0.08 1.0

3 expfunC2(10) 20000 0.08 1.0

1 expfunR(10) 20000 2.44 30.5

12

4 Example: Calculating Fibonacci recursively

The standard definition of the Fibonacci sequence is fn = fn−1 + fn−2 where

f0 = 0, f1 = 1.

A simple recursive implementation in R is:

> fibR <- function(n){

+ if (n==0)

+ return(0)

+ else

+ if (n==1)

+ return(1)

+ else

+ return(fibR(n-1)+fibR(n-2))

+ }

13

Easy to write fast C++ version:

> incltxt <-'+ int fibonacci(int n){

+ if (n==0)

+ return(0);

+ else

+ if (n==1)

+ return(1);

+ else

+ return(fibonacci(n-1)+fibonacci(n-2));

+ }'

> fibC <- cxxfunction(signature(x_="int"),

+ plugin="Rcpp",

+ include=incltxt,

+ body='+ int x = as<int>(x_);

+ return wrap( fibonacci(x) );

+ ')

14

> library(rbenchmark)

> cols <- c("test", "replications", "elapsed", "relative")

> M <- 30

> benchmark(fibR(M), fibC(M),

+ columns=cols, order="relative", replications=1)

test replications elapsed relative

1 fibR(M) 1 7.86 NA

2 fibC(M) 1 0.00 NA

15

5 Example: Matrix multiplication

Recall our first version of the matrix multiplication function:

1 /* File: matprod1.c: Calculate the product of matrices X and Y */

2 void matprod1(double *X, int *dimX , double *Y, int *dimY , double *ans){

3 double sum;

4 int ii, jj, kk;

5 int nrX=dimX[0], ncX=dimX[1], nrY=dimY[0], ncY=dimY [1];

6

7 for (ii=0; ii<nrX; ii++){

8 for (jj=0; jj<ncY; jj++){

9 sum = 0;

10 for (kk=0; kk<ncX; kk++){

11 sum = sum + X[ii+nrX*kk]*Y[kk+nrY*jj];

12 }

13 ans[ii+nrX*jj] = sum;

14 }

15 }

16 }

16

An interface using SEXPs is:

1 /* File: matprod2.c: Calculates the product of matrices X and Y */

2 #include <Rdefines.h>

3 #include "matprod1.h"

4

5 SEXP matprod2(SEXP X, SEXP Y) {

6 int nprot =0;

7 PROTECT(X = AS_NUMERIC(X)); nprot ++; /* Digest SEXPs from R */

8 PROTECT(Y = AS_NUMERIC(Y)); nprot ++;

9 double *xptr; xptr = REAL(X);

10 double *yptr; yptr = REAL(Y);

11 int *dimX; dimX = INTEGER(GET_DIM(X));

12 int *dimY; dimY = INTEGER(GET_DIM(Y));

13 SEXP ans; /* Create SEXP to hold result */

14 PROTECT(ans = allocMatrix(REALSXP , dimX[0], dimY [1])); nprot ++;

15 double *ansptr; ansptr = REAL(ans);

16 matprod1(xptr , dimX , yptr , dimY , ansptr ); /* Calculate product */

17 UNPROTECT(nprot); /* Wrap up; */

18 return(ans); /* Return the result to R */

19 }

where

1 void matprod1(double *X, int *dimX , double *Y, int *dimY , double *ans);

17

5.1 Using Rcpp together with inline

With Rcpp matrices can be indexed the usual way:

> src <- '+ NumericMatrix X(X_);

+ NumericMatrix Y(Y_);

+ NumericMatrix ans (X.nrow(), Y.ncol());

+ int ii, jj, kk;

+ for (ii=0; ii<X.nrow(); ii++){

+ for (jj=0; jj<Y.ncol(); jj++){

+ double sum = 0;

+ for (kk=0; kk<X.ncol(); kk++){

+ sum += X(ii, kk) * Y(kk, jj);

+ }

+ ans(ii,jj) = sum;

+ }

+ }

+ return(ans);'> library(inline)

> mprod5_inline_Rcpp <- cxxfunction(signature(X_="numeric", Y_="numeric"),

+ body = src, plugin="Rcpp")

18

5.2 Using RcppArmadillo together with inline

The Armadillo library is an excellent C++ package for linear algebra and

RcppArmadillo makes this easy

> src <- '+ arma::mat X = Rcpp::as<arma::mat>(X_);

+ arma::mat Y = Rcpp::as<arma::mat>(Y_);

+ arma::mat ans = X * Y;

+ return(wrap(ans));

+ '> mprod6_inline_RcppArma <- cxxfunction(signature(X_="numeric", Y_="numeric"),

+ body = src, plugin="RcppArmadillo")

19

5.3 Benchmarking – III

> library(rbenchmark)

> cols <- c("test", "replications", "elapsed", "relative")

> N <- 10

> A <- matrix(rnorm(1000000), nr=1000); B<-A[,1:5]

> benchmark(A %*% B, mprod5_inline_Rcpp(A,B), mprod6_inline_RcppArma(A,B),

+ columns=cols, order="relative", replications=N)

test replications elapsed relative

1 A %*% B 10 0.05 1.0

3 mprod6_inline_RcppArma(A, B) 10 0.06 1.2

2 mprod5_inline_Rcpp(A, B) 10 1.26 25.2

20

Tentative conclusions on the benchmarking:

• Speedwise, .Call() is better than .C().

• Speedwise, C is better than C++.

• Execution time of a program must be traded off with the programming time to

make the program work.

• For larger matrices our “own homegrown”C code seems to loose to the other

competitors.

21

6 Compiling without using inline

Rcpp based code can be compiled using R CMD SHLIB.

To do so, one must tell the compiler where to find the headers and tell the linker

which libraries to link against and where to find them. One way of doing so is by

creating a Makevars file.

Below are the Makevars files that get things to work on window and linux:

Using Rcpp: Using Rcpp alone, a Makevars file with these lines work on both

linux and windows:

PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"`PKG_CXXFLAGS=`Rscript -e "Rcpp:::CxxFlags()"`

22

Using RcppArmadillo: For compilation on windows, the file Makevars.win

contains these lines:

PKG_LIBS = $(BLAS_LIBS) $(FLIBS) $(LAPACK_LIBS) \

$(shell "Rscript.exe" -e "Rcpp:::LdFlags()")

PKG_CPPFLAGS = -I${R_HOME}/include -I${R_HOME}/library/Rcpp/include \

-I${R_HOME}/library/RcppArmadillo/include -I. -DNDEBUG

For compilation on linux, the Makevars file contains the lines:

PKG_LIBS = $(BLAS_LIBS) $(FLIBS) $(LAPACK_LIBS) \

$(shell "Rscript" -e "Rcpp:::LdFlags()")

## If Rcpp etc. are installed in /usr/local/lib/R/site-library

R_SITE=/usr/local/lib/R/site-library

PKG_CPPFLAGS = -I${R_HOME}/include -I${R_SITE}/Rcpp/include \

-I${R_SITE}/RcppArmadillo/include -I. -DNDEBUG

## If Rcpp etc. are installed in /usr/lib/R/ use instead:

### PKG_CPPFLAGS = -I${R_HOME}/include -I${R_HOME}/library/Rcpp/include \

### -I${R_HOME}/library/RcppArmadillo/include -I. -DNDEBUG

23

Using RcppEigen: For compilation on windows, the file Makevars.win contains

these lines:

PKG_LIBS = $(BLAS_LIBS) $(FLIBS) \

$(shell "Rscript.exe" -e "Rcpp:::LdFlags()")

PKG_CPPFLAGS = -I${R_HOME}/library/RcppEigen/include \

-I${R_HOME}/library/Rcpp/include -I. -DNDEBUG

For compilation on linux, the Makevars file contains the lines

PKG_LIBS = `$(R_HOME)/bin/Rscript -e "Rcpp:::LdFlags()"`## If Rcpp etc. are installed in /usr/local/lib/R/site-library

R_SITE=/usr/local/lib/R/site-library

PKG_CPPFLAGS = -I${R_HOME}/include -I${R_SITE}/Rcpp/include \

-I${R_SITE}/RcppEigen/include -I. -DNDEBUG

## If Rcpp etc. are installed in /usr/lib/R/ use instead:

## PKG_CPPFLAGS = -I${R_HOME}/library/Rcpp/include \

## -I${R_HOME}/library/RcppEigen/include -I. -DNDEBUG

24

NOTICE: For easier reading, a long line is broken up by putting ”\enter” as the

last characters of this line and the rest of this (logical) line on the next (physical)

line. (Hence there must be no white space following the backslash).

25

6.1 Example: Matrix multiplication using Rcpp

With Rcpp matrices can be indexed the usual way: Consider the file:

1 // File: matprod7.cpp

2 #include <Rcpp.h>

3

4 RcppExport SEXP matprod7( SEXP X_, SEXP Y_){

5 Rcpp:: NumericMatrix X(X_);

6 Rcpp:: NumericMatrix Y(Y_);

7 Rcpp:: NumericMatrix ans (X.nrow(), Y.ncol ());

8 int ii, jj, kk;

9 for (ii=0; ii<X.nrow (); ii++){

10 for (jj=0; jj<Y.ncol (); jj++){

11 double sum = 0;

12 for (kk=0; kk<X.ncol (); kk++){

13 sum += X(ii, kk) * Y(kk, jj);

14 }

15 ans(ii,jj) = sum;

16 }

17 }

18 return(ans);

19 }

26

To compile this file, first create a file named Makevars with the content (notice

the back quotes):

PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"`PKG_CXXFLAGS=`Rscript -e "Rcpp:::CxxFlags()"`

Next, compile the file as usual:

R CMD SHLIB src/matprod7.cpp

which creates matprod7.dll / matprod7.so.

> mprod7_Rcpp <- function(X,Y){ .Call("matprod7", X, Y) }

> A <- B <- matrix(1:9,nrow=3)

> dyn.load("src/matprod7.dll")

> mprod7_Rcpp(A, B)

[,1] [,2] [,3]

[1,] 30 66 102

[2,] 36 81 126

[3,] 42 96 150

> dyn.unload("src/matprod7.dll")

27

6.2 Example: Matrix multiplication using RcppArmadillo

1 #include <Rcpp.h>

2 #include <RcppArmadillo.h>

3

4 RcppExport SEXP matprod8( SEXP X_, SEXP Y_ ){

5 arma::mat X = Rcpp::as<arma::mat >(X_);

6 arma::mat Y = Rcpp::as<arma::mat >(Y_);

7 arma::mat ans = X * Y;

8 return Rcpp::wrap(ans);

9 }

R CMD SHLIB src_arma/matprod8.cpp

> dyn.load("src_arma/matprod8.dll")

> .Call("matprod8", A, B)

[,1] [,2] [,3]

[1,] 30 66 102

[2,] 36 81 126

[3,] 42 96 150

> dyn.unload("src_arma/matprod8.dll")

28

6.3 Example: Inverting a symmetric positive definite matrix using

RcppArmadillo

Consider inverting a symmetric

positive defininite matrix. An R approach to doing so is via a Cholesky decomposition

> spdinv_R <- function(X_) {chol2inv(chol(X_))}

An RcppArmadillo based implementation (where we do exploit that the matrix is

symmetric and positive definite):

1 /* File: spdinv -arma.cpp */

2 #include <Rcpp.h>

3 #include <RcppArmadillo.h>

4

5 RcppExport SEXP C_spdinv_arma ( SEXP X_ ){

6 arma::mat X = Rcpp::as<arma::mat >(X_);

7 arma::mat Xinv = arma::inv( arma::sympd(X) );

8 return(Rcpp::wrap(Xinv ));

9 }

R CMD SHLIB src_arma/spdinv-arma.cpp

29

Benchmarks

> library(MASS)

> library(Matrix)

> PP <- 10

> X <- cov(mvrnorm(1000, mu=rep(0,PP), Sigma=diag(1,PP)))

> Xm <- Matrix(X)

> library(rbenchmark)

> cols <- c("test", "replications", "elapsed", "relative")

> dyn.load("src_arma/spdinv-arma.dll")

> benchmark(spdinv_R(X), solve(X), solve(Xm), .Call("C_spdinv_arma", X),

+ columns=cols, replications=10000)

test replications elapsed relative

4 .Call("C_spdinv_arma", X) 10000 0.06 1.000

2 solve(X) 10000 0.92 15.333

3 solve(Xm) 10000 0.26 4.333

1 spdinv_R(X) 10000 0.88 14.667

> dyn.unload("src_arma/spdinv-arma.dll")

30

7 Further examples on using RcppArmadillo with inline

31

7.1 Example: Extracting submatrices

> library(inline)

> library(Rcpp)

> submat <- cxxfunction(signature(r_SS='float', r_rows='int', r_cols='int'), body = '+ using namespace arma;

+ using namespace Rcpp;

+ mat SS = as<mat>(r_SS);

+ uvec rowsi = as<uvec>(r_rows);

+ uvec colsi = as<uvec>(r_cols);

+ int has_col = colsi[0], has_row = rowsi[0];

+ if (has_col==0) {

+ if (has_row==0) {

+ mat yy = SS; return wrap(yy);

+ } else {

+ mat yy = SS.rows(rowsi-1); return wrap(yy);

+ }

+ } else {

+ if (has_row==0) {

+ mat yy = SS.cols(colsi-1); return wrap(yy);

+ } else {

+ mat yy = SS.submat(rowsi-1, colsi-1); return wrap(yy);

+ }

+ }

+ ', plugin='RcppArmadillo')

32

> M <- matrix(1:12,nr=3)

> submat(M, 0, 0)

[,1] [,2] [,3] [,4]

[1,] 1 4 7 10

[2,] 2 5 8 11

[3,] 3 6 9 12

> submat(M, 1:2, 0)

[,1] [,2] [,3] [,4]

[1,] 1 4 7 10

[2,] 2 5 8 11

> submat(M, 0, 2:3)

[,1] [,2]

[1,] 4 7

[2,] 5 8

[3,] 6 9

> submat(M, 1:2, 2:3)

[,1] [,2]

[1,] 4 7

[2,] 5 8

33

8 Calling R from C++

> toString_ <- cxxfunction(signature(x_ = "", sep_=""), plugin = "Rcpp",

+ body = '+ using namespace Rcpp;

+ using namespace std;

+ CharacterVector x=as<CharacterVector>(x_);

+ CharacterVector sep=as<CharacterVector>(sep_);

+ std::string res;

+ for (int ii=0; ii<(x.length())-1; ii++){

+ res += x[ii];

+ res += as<string>(sep);

+ }

+ res += x[x.length()-1];

+ return(wrap(res));')> toString_(c("foo","bar","bob"),";")

[1] "foo;bar;bob"

> toString_(c(1,2,3),";")

[1] "1;2;3"

34

> get_index_ <- cxxfunction(signature(x_ = "SEXP", V_="SEXP"), plugin = "Rcpp",

+ body = '+ using namespace Rcpp;

+ using namespace std;

+ Function R_match("match");

+ CharacterVector V(V_);

+ List x(x_);

+ List res=List::create();

+ for (int ii=0; ii<x.length(); ii++){

+ CharacterVector ee = as<CharacterVector>(x[ii]);

+ IntegerVector out = R_match(wrap(ee), V_);

+ Rf_PrintValue(out);

+ res.push_back(out);

+ }

+ return(wrap(res));

+ ')> a<-get_index_(list(c("a","b"),c("b","c"), c("a","c","e"), c("a","c","q")), letters[1:10])

[1] 1 2

[1] 2 3

[1] 1 3 5

[1] 1 3 NA

> str(a)

List of 4

$ : int [1:2] 1 2

$ : int [1:2] 2 3

$ : int [1:3] 1 3 5

$ : int [1:3] 1 3 NA

35

36

9 Building packages using Rcpp libraries

Your friends are:

> library(Rcpp)

> Rcpp.package.skeleton()

> library(RcppArmadillo)

> RcppArmadillo.package.skeleton

37

10 EXERCISES

1. Using inline and RcppArmadillo, implement a function that calculates the

conditional variance in a multivariate normal distribution, i.e.

V ar(Ya|Yb) = Σaa − ΣabΣ−1bb Σba

2. Compare the performance in computing time with a pure R implementation:

> condVarR <- function(SS, aa, bb){

+ SS[aa,aa] - SS[aa,bb,drop=FALSE] %*% solve(SS[bb,bb,drop=FALSE],SS[bb,aa,drop=FALSE])

+ }

Recommended