C++ and Rcpp “Rcpp”: How to use C++ with R Jiaqi Xu Department of Statistics April 30, 2015 1

Preview:

Citation preview

C++ and Rcpp“Rcpp”: How to use C++ with R

Jiaqi Xu

Department of Statistics

April 30, 2015

1

• What is C++?From Wikipedia, C++ is a general-purpose programming language.

• What are the features?o Imperativeo Object-orientedo Generic programmingo Providing the facilities for low-level memory manipulation.

3

• 1979 Bjarne Stroustrup (http://www.stroustrup.com/) , Simula

• Extension of C, with object-oriented programming

• 1979-1991: C++'s design and early years. (http://www.stroustrup.com/hopl2.pdf)

• 1991-2006: Evolving a language in and for the real world: C++

(http://www.stroustrup.com/hopl-almost-final.pdf)

Historical of C++

4

Phases of C++ Programs:

1. Edit

2. Preprocess

3. Compile

4. Link

5. Load

6. Execute

Basics of a Typical C++ Environment

Loader

PrimaryMemory

Program is created inthe editor and storedon disk.

Preprocessor programprocesses the code.

Loader puts programin memory.

CPU takes eachinstruction andexecutes it, possiblystoring new datavalues as the programexecutes.

CompilerCompiler createsobject code and storesit on disk.

Linker links the objectcode with the libraries.

Editor

Preprocessor

Linker

 CPU

PrimaryMemory

.

.

.

.

.

.

.

.

.

.

.

.

Disk

Disk

Disk

Disk

Disk

5

• Three types of computer languageso Machine language Only language computer directly understands “Natural language” of computer Cumbersome for humans

o Assembly language English-like abbreviations representing elementary computer operations Clearer to humans Incomprehensible to computers

o High-level languages Similar to everyday English, use common mathematical notations Single statements accomplish substantial tasks Translator programs (compilers)

Computer Languages

6

• some compilers that can be downloaded for freeo Apple C++. It also comes with OS X on the developer tools CD. o Bloodshed Dev-C++. A GCC-based (Mingw) IDE. o Cygwin (GNU C++) …

• Some compilers that require payment (some allow free downloads for trial periods):

o Embarcadero C++ o Edison Design Group C++ Front End - used by many C++ compiler

suppliers o Green Hills C++ for many embedded systems platforms

Compiler for C++

7

• C++ compiler online http://webcompiler.cloudapp.net/

• C++ Shell http://cpp.sh/

• Visual Studio 2013 (90 days free trial,11GB)

https://www.visualstudio.com/en-us/downloads/download-visual-studio-vs.aspx

• Visual Studio 2015 (preview)

https://www.visualstudio.com/en-us/downloads/visual-studio-2015-downloads-vs.aspx

• HPC g++ compiler (HPC notes P57) age.cpp hello.cpp

Compiler for C++

8

CodeBlocks (IDE)

http://www.codeblocks.org/•Choose Download the binary release

9

CodeBlocks

• Select File “codeblocks-13.12mingw-setup.exe”Download from Sourceforge.net• Save “codeblocks-13.12mingw-setup.exe” (size 97.8MB)

10

IDE for C++

11

• hello.cpp (example 1.1)#include <iostream>

using std::cout; /*or use using namespace std;*/

int main()

{

cout << "Hello World!“<<endl; /*cout<<“Hello World!\n” doesn’t work here*/

return 0;

}

Comments Ignored by compiler Single-line comment, begin with // Multiple-lines comment,/* */

Preprocessor directives Processed by preprocessor before compiling Begin with #

<iostream> Standard Input / Output Streams Library

Introduction Example

12

Standard output stream object std::cout

– “Connected” to screen– <<

• Stream insertion operator • Value to right (right operand) inserted into output stream

Namespace– std:: specifies using name that belongs to “namespace” std– std:: removed through use of using statements

Escape characters– \– Indicates “special” character output – \n , new line Position the screen cursor to the beginning of the next line.

return 0; // indicate that program ended successfully

13

Data Type• Integral char (enclosed in single quotes, eg ‘a’=97)

int value (short integer , long integer )

• Floating float (single precision)

double

long double

• Address point *

reference &

• Structured array

struct

union

class

15 152 ~ 2 1 31 312 ~ 2

14

Sample Program• DataType.cpp (example 1.2)#include<iostream> using namespace std;

int main(void){

double wdth, height;const int LEN = 5;wdth = 10.0;height = 8.5;cout << “volume = “<< LEN * wdth * height;

}

-----------------------------------------------------------------------------------------------------------------------------------------------------Comments:• double wdth, height; This is a declaration statement. A declaration statement associates an identifier with a data object, a function, or a data type so that the programmer can refer to that item by name.• Const int LEN: value of LEN cannot be changed after it is initialized

15

Control Structures • Sequence structureo Programs executed sequentially by default

• Selection structureso if, if/else, switch

• Repetition structureso while, do/while, for

16

Switch Multiple-Selection Structure• switch

o Test variable for multiple valueso Series of case labels and optional default case

switch ( variable ) {case value1: // taken if variable == value1statementsbreak; // necessary to exit switch

case value2:case value3: // taken if variable == value2 or == value3statementsbreak;

default: // taken if variable matches no other casesstatements break;

}

17

Example 1.3/*Grade.cpp*/

#include<iostream>

using namespace std;

int main()

{

char let_grd;

cout <<"Please type in your grade"<<endl;

cin >>let_grd;

switch (let_grd)

{

case 'A':

cout << "Congratulations!";

break;

case 'B':

cout << "Good job!";

break;

case 'C':

cout << "ok, but you can do better!";

break;

case 'D':

cout << "Better luck in next time!";

break;

case 'F':

cout << " Have fun in summer school!";

break;

default:

cout << "You entered an invalid grade.";

}

system("PAUSE");

}

18

Boost Library InstallationBoost Version 1.58.0 (http://www.boost.org/users/history/version_1_58_0.html)

(Size 117MB)

“Installation

To install Boost.Build from an official release or a nightly build, as available on the official web site, follow these steps:

•Unpack the release. On the command line, go to the root of the unpacked tree.

•Run either .\bootstrap.bat (on Windows), or ./bootstrap.sh (on other operating systems). (I use bootstrap.bat then .\b2)

•Run ./b2 install --prefix=PREFIX where PREFIX is a directory where you want Boost.Build to be installed.

•Optionally, add PREFIX/bin to your PATH environment variable.”

19

Boost Library InstallationIt will take about 30 minutes to install this library

20

Boost Library Settings• Code::Blocks 13.12

21

Boost Library Settings

22

Boost Library Settings• Visual Studio

23

Example 1.4 • Use boost library to calculate the CDF of Normal Distribution in a particular value

Code:/*BoostCDF.cpp*/

#include <iostream>

#include <boost/math/distributions/normal.hpp> // for normal_distribution

using boost::math::normal; // typedef provides default type is double.

using namespace std;

int main()

{

double mean = 140.; // sacks per week.

double standard_deviation = 10;

normal sacks(mean, standard_deviation);

double stock = 160.; // per week.

cout << "Percentage of weeks overstocked "

<< cdf(sacks, stock) * 100. << endl; // P(X <=160)

// Percentage of weeks overstocked 97.7

system("pause");

}

24

• What is Rcpp?An R package provides R functions as well as C++ classes which offer a

seamless integration of R and C++.

• What are the advantages compare to R?o Recursive functions, the overhead of calling a function is much lower

than Ro Provided advanced data structures and algorithms that R doesn’t

provideo Loops that can be easily vectorised

26

Install Rcpp in R• Rstudio• Install Rccp package• Install R tools

27

Install Rcpp in R

28

Why Rcpp?A short example • review the previous C++ part • see how to write C++ function in R• Compare the running speed to R

Example: The Fibonacci sequence is defined as a recursive sum of the two preceding terms in the same

sequence: with these two initial conditionsso that the first ten numbers of the sequence to are seen to be 0,1,1,2,3,5,8,13,21,34.

Use R, C++ and Rcpp to write a function to find the Fibonacci sequence with the same method (recursive) first, then compare their running time.

29

Why Rcpp?(example 2.1)• R, the recursive function is:fibR <- function(n) {

if (n == 0) return(0)

if (n == 1) return(1)

return (fibR(n - 1) + fibR(n - 2))

}

• In C++, the code is:#include <iostream>using namespace std;int fibonacci(const int x) { if (x == 0) return(0); if (x == 1) return(1); return (fibonacci(x - 1)+ fibonacci(x - 2)); } int main(){ cout<<"fibonacci 30 is"<<fibonacci(30)<<endl;}

• In Rcpp: cppFunction()cppFunction( 'int fibonacci(const int x) {if (x == 0) return(0); if (x == 1) return(1); return (fibonacci(x - 1)) + fibonacci(x - 2); }')

cpp file#include <Rcpp.h>using namespace Rcpp;// [[Rcpp::export]]int fibonacci2(const int x) { if (x < 2) return x; else return (fibonacci2(x - 1)) + fibonacci2(x - 2);}

30

Why Rcpp?(example 2.1)• Comparing the running time Calculate running time in C++:#include <iostream>#include <ctime>using namespace std;int fibonacci(const int x) { if (x == 0) return(0); if (x == 1) return(1); return (fibonacci(x - 1)+ fibonacci(x - 2)); } int main(){ clock_t start_s, end_s; start_s=clock(); fibonacci(30); end_s=clock(); double cpu_time_used; cpu_time_used = ((double) (end_s - start_s))/CLOCKS_PER_SEC;//in seconds cout<<"fibonacci 30 is"<<fibonacci(30)<<endl; cout << "time: " << cpu_time_used<< endl;}

• In R: benchmark Sys.time() proc.time() system.time()

Code in R Studio:fibR.time<-system.time(fibR(30))

fibonacci2.time<-system.time(fibonacci2(30))

ctime<-c(fibR.time[3],fibonacci2.time[3],0.012)

spdup<-round(ctime[1]/ctime,2)

names(spdup)<-c("Use R","Rcpp","C++")

midpoints<-barplot(spdup,ylab="Speed up")

text(midpoints,c(100,200,800),labels=spdup)

31

Why Rcpp?(example 2.1)

32

• Rules to write a function in Rcpp: You must declare the type of output the function returns. This function returns an int (a scalar

integer). Scalars and vectors are different. The scalar equivalents of numeric, integer, character, and logical

vectors are: double, int, String, and bool. You must use an explicit return statement to return a value from a function. Every statement is terminated by a ;. Include // [[Rcpp::export]]; for each different function in the same Rcpp file.

Useful Variable Types & Control Structures for Rcpp

33

Variable Types•scalar intdoubleboolString

•VectorNumericVectorIntegerVectorCharacterVectorLogicalVector

•matrix IntegerMatrixNumericMatrixLogicalMatrixCharacterMatrix

Variable Types & Loop FunctionsLoop Function•While loopwhile(condition)

{

statement(s);

}

•For loopfor ( init; condition; increment )

{

statement(s);

}

•Do while loopdo

{

statement(s);

}while( condition );

•Nested loopDouble for loops, double do while loops

34

Example 2.2• Write a function to calculate the mean

o For vector, use for loop#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]

/*mean for the vector*/

double rowmeanC(NumericVector x) {

int n=x.size();

double total = 0;

for(int i = 0; i < n; ++i) {

total += x[i];

}

return total/n;

}

o For vector, use for loop nested do while loop

// [[Rcpp::export]]

/*row means for a matrix*/

NumericVector rowmeanC2(NumericMatrix x) {

int nrow = x.nrow(), ncol = x.ncol();

NumericVector out(nrow);

/*for (int i = 0; i < nrow; i++) {*/

int i=0;

do{

double total = 0;

for (int j = 0; j < ncol; j++) {

total += x(i, j);

}

out[i] = total;

i++;

}while(i<nrow);

return out/ncol;

}

35

Rcpp Sugar• “Sugar”o syntactic “sugar” to ensure that C++ functions work very similarly to their R equivalents.

• Use sugar functions o It’ll be both expressive and well tested.

Advantage:get faster in the future as more time is spent on optimising Rcpp

Disadvantage:They aren’t always faster than a handwritten equivalent

36

Rcpp Sugar FunctionsA grab bag of sugar functions that mimic frequently used R functions•Math functions: abs(), acos(), asin(), atan(), beta(), ceil(), ceiling(), choose(), cos(), cosh(), digamma(), exp(), expm1(), factorial(), floor(), gamma(), lbeta(), lchoose(), lfactorial(), lgamma(), log(), log10(), log1p(), pentagamma(), psigamma(), round(), signif(), sin(), sinh(), sqrt(), tan(), tanh(), tetragamma(), trigamma(), trunc().

•Scalar summaries:

mean(), min(), max(), sum(), sd(), and (for vectors) var().

•Vector summariescumsum(), diff(), pmin(), and pmax().

•Finding values: match(), self_match(), which_max(), which_min().

•Dealing with duplicates: duplicated(), unique().

•d/q/p/r for all standard distributions.

37

Example 2.3#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export()]]

NumericVector getNTN (int n) {

NumericVector draw(n);

double high = 3 ;

double low = -3;

bool valid = false ;

NumericVector cand1 = rnorm(n, 0, 1) ;

while (!valid) {

for(int i=0;i<n;i++){

if (cand1(i) >= high)

draw(i)=high;

else if(cand1(i)<=low)

draw(i)=low;

else draw(i)=cand1(i);

valid = true ;

}

}

return(draw);

}

38

More About Rcpp?• The package comes with nine pdf vignettes, and numerous help pages.> vignette(package="Rcpp")Vignettes in package ‘Rcpp’:Rcpp-attributes, Rcpp-extending, Rcpp-FAQ, Rcpp-introduction, Rcpp-modules, Rcpp-package, Rcpp-quickref

Rcpp-sugar, Rcpp-unitTests• Rcpp online tutorial (http://adv-r.had.co.nz/Rcpp.html)• Rcpp Gallery (http://gallery.rcpp.org/) • Book

39

Reference• Bjarne Stroustrup (http://www.stroustrup.com/)

• Rcpp online tutorial (http://adv-r.had.co.nz/Rcpp.html)

• Rcpp Gallery (http://gallery.rcpp.org/)

• “Seamless R and C++ Integration with Rcpp” by Dirk Eddelbuettel.

40

Thanks!

Questions?

41

Recommended