37
Interfacing C++ code from R Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark November 20, 2012 Printed: November 20, 2012 File: interfaceCpp-slides.tex

Interfacing C++ code from R - People.math.aau.dksorenh/teaching/2012-ASC/day...Interfacing C++ code from R S˝ren H˝jsgaard Department of Mathematical Sciences Aalborg University,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • Interfacing C++ code from R

    Søren Højsgaard

    Department of Mathematical Sciences

    Aalborg University, Denmark

    November 20, 2012

    Printed: November 20, 2012 File: interfaceCpp-slides.tex

  • 2

    Contents

    1 Calling C++ from R 3

    2 Some important libraries and packages 4

    3 Example: The exponential function 63.1 Using Rcpp together with inline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    4 Example: Calculating Fibonacci recursively 12

    5 Example: Matrix multiplication 155.1 Using Rcpp together with inline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.2 Using RcppArmadillo together with inline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.3 Benchmarking – III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    6 Compiling without using inline 216.1 Example: Matrix multiplication using Rcpp . . . . . . . . . . . . . . . . . . . . . . . . . . . 256.2 Example: Matrix multiplication using RcppArmadillo . . . . . . . . . . . . . . . . . . . . . . 276.3 Example: Inverting a symmetric positive definite matrix using RcppArmadillo . . . . . . . . . . 28

    7 Further examples on using RcppArmadillo with inline 307.1 Example: Extracting submatrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    8 Calling R from C++ 33

    9 Building packages using Rcpp libraries 36

    10 EXERCISES 37

  • 3

    1 Calling C++ from R

    C++ code can be called from R.

    • Easily done using the Rcpp package. (There are other ways not to bediscussed here).

    • The Rcpp package combines nicely with the inline package.

    • There are several existing libraries available with a C++ interface that we mayuse – instead of “reinventing the wheel” (as we did when creating our own

    matrix multiplication functions).

  • 4

    2 Some important libraries and packages

    • Armadillo is an open-source C++ linear algebra library (matrix maths) aimingtowards a good balance between speed and ease of use.

    http://arma.sourceforge.net/

    R interface via RcppArmadillo

    • Eigen is a C++ template library for linear algebra: matrices, vectors, numericalsolvers, and related algorithms.

    http://eigen.tuxfamily.org/

    R interface via RcppEigen

    • The GNU Scientific Library (GSL) is a numerical library for C and C++programmers.

    http://www.gnu.org/software/gsl/

    R interface via RcppGSL

    http://arma.sourceforge.net/http://eigen.tuxfamily.org/http://www.gnu.org/software/gsl/

  • 5

    Notice:

    • To use these packages we must program in C++ rather than in C.

    • Dirk Eddelbuettel provides many many examples on his website:http://dirk.eddelbuettel.com/code/rcpp.html

    http://dirk.eddelbuettel.com/code/rcpp.html

  • 6

    3 Example: The exponential function

    Recall our implementation of the exponential function in R:

    > expfunR

  • 7

    A pure C implementation (which ignores the numerical difficulties when x < 0) is:

    1 #include

    2 double C_expfun2 (double x){

    3 double ans=1.0, term =1.0, eps=1e-16;

    4 int n=0;

    5 while (fabs(term)>eps){

    6 n++;

    7 term = term * x / n;

    8 ans = ans + term;

    9 }

    10 return(ans);

    11 }

  • 8

    3.1 Using Rcpp together with inline

    > src library(inline)

    > expfunC

  • 9

    An alternative implementation is to use the original function (accounting for

    nummerical difficulties when x < 0 has been resolved):

    > incltxt expfunC2

  • 10

    > xx c(expfunR(xx), expfunC(xx), expfunC2(xx))

    [1] 2.718282 2.718282 2.718282

    > xx c(expfunR(xx), expfunC(xx), expfunC2(xx))

    [1] 485165195 485165195 485165195

    > xx c(expfunR(xx), expfunC(xx), expfunC2(xx))

    [1] 2.061154e-09 5.621884e-09 2.061154e-09

    > (expfunC(-20)-exp(-20))/exp(-20)

    [1] 1.727543

    > (expfunC2(-20)-exp(-20))/exp(-20)

    [1] 2.006596e-16

    >

  • 11

    > library(rbenchmark)

    > cols N benchmark(expfunR(10), expfunC(10), expfunC2(10),

    + columns=cols, order="relative", replications=N)

    test replications elapsed relative

    2 expfunC(10) 20000 0.08 1.0

    3 expfunC2(10) 20000 0.08 1.0

    1 expfunR(10) 20000 2.44 30.5

  • 12

    4 Example: Calculating Fibonacci recursively

    The standard definition of the Fibonacci sequence is fn = fn−1 + fn−2 where

    f0 = 0, f1 = 1.

    A simple recursive implementation in R is:

    > fibR

  • 13

    Easy to write fast C++ version:

    > incltxt fibC

  • 14

    > library(rbenchmark)

    > cols M benchmark(fibR(M), fibC(M),

    + columns=cols, order="relative", replications=1)

    test replications elapsed relative

    1 fibR(M) 1 7.86 NA

    2 fibC(M) 1 0.00 NA

  • 15

    5 Example: Matrix multiplication

    Recall our first version of the matrix multiplication function:

    1 /* File: matprod1.c: Calculate the product of matrices X and Y */

    2 void matprod1(double *X, int *dimX , double *Y, int *dimY , double *ans){

    3 double sum;

    4 int ii, jj, kk;

    5 int nrX=dimX[0], ncX=dimX[1], nrY=dimY[0], ncY=dimY [1];

    6

    7 for (ii=0; ii

  • 16

    An interface using SEXPs is:

    1 /* File: matprod2.c: Calculates the product of matrices X and Y */

    2 #include

    3 #include "matprod1.h"

    4

    5 SEXP matprod2(SEXP X, SEXP Y) {

    6 int nprot =0;

    7 PROTECT(X = AS_NUMERIC(X)); nprot ++; /* Digest SEXPs from R */

    8 PROTECT(Y = AS_NUMERIC(Y)); nprot ++;

    9 double *xptr; xptr = REAL(X);

    10 double *yptr; yptr = REAL(Y);

    11 int *dimX; dimX = INTEGER(GET_DIM(X));

    12 int *dimY; dimY = INTEGER(GET_DIM(Y));

    13 SEXP ans; /* Create SEXP to hold result */

    14 PROTECT(ans = allocMatrix(REALSXP , dimX[0], dimY [1])); nprot ++;

    15 double *ansptr; ansptr = REAL(ans);

    16 matprod1(xptr , dimX , yptr , dimY , ansptr ); /* Calculate product */

    17 UNPROTECT(nprot); /* Wrap up; */

    18 return(ans); /* Return the result to R */

    19 }

    where

    1 void matprod1(double *X, int *dimX , double *Y, int *dimY , double *ans);

  • 17

    5.1 Using Rcpp together with inline

    With Rcpp matrices can be indexed the usual way:

    > src library(inline)

    > mprod5_inline_Rcpp

  • 18

    5.2 Using RcppArmadillo together with inline

    The Armadillo library is an excellent C++ package for linear algebra and

    RcppArmadillo makes this easy

    > src mprod6_inline_RcppArma

  • 19

    5.3 Benchmarking – III

    > library(rbenchmark)

    > cols N A

  • 20

    Tentative conclusions on the benchmarking:

    • Speedwise, .Call() is better than .C().

    • Speedwise, C is better than C++.

    • Execution time of a program must be traded off with the programming time tomake the program work.

    • For larger matrices our “own homegrown”C code seems to loose to the othercompetitors.

  • 21

    6 Compiling without using inline

    Rcpp based code can be compiled using R CMD SHLIB.

    To do so, one must tell the compiler where to find the headers and tell the linker

    which libraries to link against and where to find them. One way of doing so is by

    creating a Makevars file.

    Below are the Makevars files that get things to work on window and linux:

    Using Rcpp: Using Rcpp alone, a Makevars file with these lines work on both

    linux and windows:

    PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"`PKG_CXXFLAGS=`Rscript -e "Rcpp:::CxxFlags()"`

  • 22

    Using RcppArmadillo: For compilation on windows, the file Makevars.win

    contains these lines:

    PKG_LIBS = $(BLAS_LIBS) $(FLIBS) $(LAPACK_LIBS) \

    $(shell "Rscript.exe" -e "Rcpp:::LdFlags()")

    PKG_CPPFLAGS = -I${R_HOME}/include -I${R_HOME}/library/Rcpp/include \

    -I${R_HOME}/library/RcppArmadillo/include -I. -DNDEBUG

    For compilation on linux, the Makevars file contains the lines:

    PKG_LIBS = $(BLAS_LIBS) $(FLIBS) $(LAPACK_LIBS) \

    $(shell "Rscript" -e "Rcpp:::LdFlags()")

    ## If Rcpp etc. are installed in /usr/local/lib/R/site-library

    R_SITE=/usr/local/lib/R/site-library

    PKG_CPPFLAGS = -I${R_HOME}/include -I${R_SITE}/Rcpp/include \

    -I${R_SITE}/RcppArmadillo/include -I. -DNDEBUG

    ## If Rcpp etc. are installed in /usr/lib/R/ use instead:

    ### PKG_CPPFLAGS = -I${R_HOME}/include -I${R_HOME}/library/Rcpp/include \

    ### -I${R_HOME}/library/RcppArmadillo/include -I. -DNDEBUG

  • 23

    Using RcppEigen: For compilation on windows, the file Makevars.win contains

    these lines:

    PKG_LIBS = $(BLAS_LIBS) $(FLIBS) \

    $(shell "Rscript.exe" -e "Rcpp:::LdFlags()")

    PKG_CPPFLAGS = -I${R_HOME}/library/RcppEigen/include \

    -I${R_HOME}/library/Rcpp/include -I. -DNDEBUG

    For compilation on linux, the Makevars file contains the lines

    PKG_LIBS = `$(R_HOME)/bin/Rscript -e "Rcpp:::LdFlags()"`## If Rcpp etc. are installed in /usr/local/lib/R/site-library

    R_SITE=/usr/local/lib/R/site-library

    PKG_CPPFLAGS = -I${R_HOME}/include -I${R_SITE}/Rcpp/include \

    -I${R_SITE}/RcppEigen/include -I. -DNDEBUG

    ## If Rcpp etc. are installed in /usr/lib/R/ use instead:

    ## PKG_CPPFLAGS = -I${R_HOME}/library/Rcpp/include \

    ## -I${R_HOME}/library/RcppEigen/include -I. -DNDEBUG

  • 24

    NOTICE: For easier reading, a long line is broken up by putting ”\enter” as thelast characters of this line and the rest of this (logical) line on the next (physical)

    line. (Hence there must be no white space following the backslash).

  • 25

    6.1 Example: Matrix multiplication using Rcpp

    With Rcpp matrices can be indexed the usual way: Consider the file:

    1 // File: matprod7.cpp

    2 #include

    3

    4 RcppExport SEXP matprod7( SEXP X_, SEXP Y_){

    5 Rcpp:: NumericMatrix X(X_);

    6 Rcpp:: NumericMatrix Y(Y_);

    7 Rcpp:: NumericMatrix ans (X.nrow(), Y.ncol ());

    8 int ii, jj, kk;

    9 for (ii=0; ii

  • 26

    To compile this file, first create a file named Makevars with the content (notice

    the back quotes):

    PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"`PKG_CXXFLAGS=`Rscript -e "Rcpp:::CxxFlags()"`

    Next, compile the file as usual:

    R CMD SHLIB src/matprod7.cpp

    which creates matprod7.dll / matprod7.so.

    > mprod7_Rcpp A mprod7_Rcpp(A, B)

    [,1] [,2] [,3]

    [1,] 30 66 102

    [2,] 36 81 126

    [3,] 42 96 150

    > dyn.unload("src/matprod7.dll")

  • 27

    6.2 Example: Matrix multiplication using RcppArmadillo

    1 #include

    2 #include

    3

    4 RcppExport SEXP matprod8( SEXP X_, SEXP Y_ ){

    5 arma::mat X = Rcpp::as(X_);

    6 arma::mat Y = Rcpp::as(Y_);

    7 arma::mat ans = X * Y;

    8 return Rcpp::wrap(ans);

    9 }

    R CMD SHLIB src_arma/matprod8.cpp

    > dyn.load("src_arma/matprod8.dll")

    > .Call("matprod8", A, B)

    [,1] [,2] [,3]

    [1,] 30 66 102

    [2,] 36 81 126

    [3,] 42 96 150

    > dyn.unload("src_arma/matprod8.dll")

  • 28

    6.3 Example: Inverting a symmetric positive definite matrix using

    RcppArmadillo

    Consider inverting a symmetric

    positive defininite matrix. An R approach to doing so is via a Cholesky decomposition

    > spdinv_R

  • 29

    Benchmarks

    > library(MASS)

    > library(Matrix)

    > PP X Xm library(rbenchmark)

    > cols dyn.load("src_arma/spdinv-arma.dll")

    > benchmark(spdinv_R(X), solve(X), solve(Xm), .Call("C_spdinv_arma", X),

    + columns=cols, replications=10000)

    test replications elapsed relative

    4 .Call("C_spdinv_arma", X) 10000 0.06 1.000

    2 solve(X) 10000 0.92 15.333

    3 solve(Xm) 10000 0.26 4.333

    1 spdinv_R(X) 10000 0.88 14.667

    > dyn.unload("src_arma/spdinv-arma.dll")

  • 30

    7 Further examples on using RcppArmadillo with inline

  • 31

    7.1 Example: Extracting submatrices

    > library(inline)

    > library(Rcpp)

    > submat

  • 32

    > M submat(M, 0, 0)

    [,1] [,2] [,3] [,4]

    [1,] 1 4 7 10

    [2,] 2 5 8 11

    [3,] 3 6 9 12

    > submat(M, 1:2, 0)

    [,1] [,2] [,3] [,4]

    [1,] 1 4 7 10

    [2,] 2 5 8 11

    > submat(M, 0, 2:3)

    [,1] [,2]

    [1,] 4 7

    [2,] 5 8

    [3,] 6 9

    > submat(M, 1:2, 2:3)

    [,1] [,2]

    [1,] 4 7

    [2,] 5 8

  • 33

    8 Calling R from C++

    > toString_ toString_(c("foo","bar","bob"),";")

    [1] "foo;bar;bob"

    > toString_(c(1,2,3),";")

    [1] "1;2;3"

  • 34

    > get_index_ a str(a)

    List of 4

    $ : int [1:2] 1 2

    $ : int [1:2] 2 3

    $ : int [1:3] 1 3 5

    $ : int [1:3] 1 3 NA

  • 35

  • 36

    9 Building packages using Rcpp libraries

    Your friends are:

    > library(Rcpp)

    > Rcpp.package.skeleton()

    > library(RcppArmadillo)

    > RcppArmadillo.package.skeleton

  • 37

    10 EXERCISES

    1. Using inline and RcppArmadillo, implement a function that calculates the

    conditional variance in a multivariate normal distribution, i.e.

    V ar(Ya|Yb) = Σaa − ΣabΣ−1bb Σba

    2. Compare the performance in computing time with a pure R implementation:

    > condVarR