Upload
philip-kelley-ramsey
View
221
Download
0
Embed Size (px)
Citation preview
C++ and Rcpp“Rcpp”: How to use C++ with R
Jiaqi Xu
Department of Statistics
April 30, 2015
1
Part I. Brief Introduction to C++
2
• What is C++?From Wikipedia, C++ is a general-purpose programming language.
• What are the features?o Imperativeo Object-orientedo Generic programmingo Providing the facilities for low-level memory manipulation.
3
• 1979 Bjarne Stroustrup (http://www.stroustrup.com/) , Simula
• Extension of C, with object-oriented programming
• 1979-1991: C++'s design and early years. (http://www.stroustrup.com/hopl2.pdf)
• 1991-2006: Evolving a language in and for the real world: C++
(http://www.stroustrup.com/hopl-almost-final.pdf)
Historical of C++
4
Phases of C++ Programs:
1. Edit
2. Preprocess
3. Compile
4. Link
5. Load
6. Execute
Basics of a Typical C++ Environment
Loader
PrimaryMemory
Program is created inthe editor and storedon disk.
Preprocessor programprocesses the code.
Loader puts programin memory.
CPU takes eachinstruction andexecutes it, possiblystoring new datavalues as the programexecutes.
CompilerCompiler createsobject code and storesit on disk.
Linker links the objectcode with the libraries.
Editor
Preprocessor
Linker
CPU
PrimaryMemory
.
.
.
.
.
.
.
.
.
.
.
.
Disk
Disk
Disk
Disk
Disk
5
• Three types of computer languageso Machine language Only language computer directly understands “Natural language” of computer Cumbersome for humans
o Assembly language English-like abbreviations representing elementary computer operations Clearer to humans Incomprehensible to computers
o High-level languages Similar to everyday English, use common mathematical notations Single statements accomplish substantial tasks Translator programs (compilers)
Computer Languages
6
• some compilers that can be downloaded for freeo Apple C++. It also comes with OS X on the developer tools CD. o Bloodshed Dev-C++. A GCC-based (Mingw) IDE. o Cygwin (GNU C++) …
• Some compilers that require payment (some allow free downloads for trial periods):
o Embarcadero C++ o Edison Design Group C++ Front End - used by many C++ compiler
suppliers o Green Hills C++ for many embedded systems platforms
…
Compiler for C++
7
• C++ compiler online http://webcompiler.cloudapp.net/
• C++ Shell http://cpp.sh/
• Visual Studio 2013 (90 days free trial,11GB)
https://www.visualstudio.com/en-us/downloads/download-visual-studio-vs.aspx
• Visual Studio 2015 (preview)
https://www.visualstudio.com/en-us/downloads/visual-studio-2015-downloads-vs.aspx
• HPC g++ compiler (HPC notes P57) age.cpp hello.cpp
Compiler for C++
8
CodeBlocks (IDE)
http://www.codeblocks.org/•Choose Download the binary release
9
CodeBlocks
• Select File “codeblocks-13.12mingw-setup.exe”Download from Sourceforge.net• Save “codeblocks-13.12mingw-setup.exe” (size 97.8MB)
10
IDE for C++
11
• hello.cpp (example 1.1)#include <iostream>
using std::cout; /*or use using namespace std;*/
int main()
{
cout << "Hello World!“<<endl; /*cout<<“Hello World!\n” doesn’t work here*/
return 0;
}
Comments Ignored by compiler Single-line comment, begin with // Multiple-lines comment,/* */
Preprocessor directives Processed by preprocessor before compiling Begin with #
<iostream> Standard Input / Output Streams Library
Introduction Example
12
Standard output stream object std::cout
– “Connected” to screen– <<
• Stream insertion operator • Value to right (right operand) inserted into output stream
Namespace– std:: specifies using name that belongs to “namespace” std– std:: removed through use of using statements
Escape characters– \– Indicates “special” character output – \n , new line Position the screen cursor to the beginning of the next line.
return 0; // indicate that program ended successfully
13
Data Type• Integral char (enclosed in single quotes, eg ‘a’=97)
int value (short integer , long integer )
• Floating float (single precision)
double
long double
• Address point *
reference &
• Structured array
struct
union
class
15 152 ~ 2 1 31 312 ~ 2
14
Sample Program• DataType.cpp (example 1.2)#include<iostream> using namespace std;
int main(void){
double wdth, height;const int LEN = 5;wdth = 10.0;height = 8.5;cout << “volume = “<< LEN * wdth * height;
}
-----------------------------------------------------------------------------------------------------------------------------------------------------Comments:• double wdth, height; This is a declaration statement. A declaration statement associates an identifier with a data object, a function, or a data type so that the programmer can refer to that item by name.• Const int LEN: value of LEN cannot be changed after it is initialized
15
Control Structures • Sequence structureo Programs executed sequentially by default
• Selection structureso if, if/else, switch
• Repetition structureso while, do/while, for
16
Switch Multiple-Selection Structure• switch
o Test variable for multiple valueso Series of case labels and optional default case
switch ( variable ) {case value1: // taken if variable == value1statementsbreak; // necessary to exit switch
case value2:case value3: // taken if variable == value2 or == value3statementsbreak;
default: // taken if variable matches no other casesstatements break;
}
17
Example 1.3/*Grade.cpp*/
#include<iostream>
using namespace std;
int main()
{
char let_grd;
cout <<"Please type in your grade"<<endl;
cin >>let_grd;
switch (let_grd)
{
case 'A':
cout << "Congratulations!";
break;
case 'B':
cout << "Good job!";
break;
case 'C':
cout << "ok, but you can do better!";
break;
case 'D':
cout << "Better luck in next time!";
break;
case 'F':
cout << " Have fun in summer school!";
break;
default:
cout << "You entered an invalid grade.";
}
system("PAUSE");
}
18
Boost Library InstallationBoost Version 1.58.0 (http://www.boost.org/users/history/version_1_58_0.html)
(Size 117MB)
“Installation
To install Boost.Build from an official release or a nightly build, as available on the official web site, follow these steps:
•Unpack the release. On the command line, go to the root of the unpacked tree.
•Run either .\bootstrap.bat (on Windows), or ./bootstrap.sh (on other operating systems). (I use bootstrap.bat then .\b2)
•Run ./b2 install --prefix=PREFIX where PREFIX is a directory where you want Boost.Build to be installed.
•Optionally, add PREFIX/bin to your PATH environment variable.”
19
Boost Library InstallationIt will take about 30 minutes to install this library
20
Boost Library Settings• Code::Blocks 13.12
21
Boost Library Settings
22
Boost Library Settings• Visual Studio
23
Example 1.4 • Use boost library to calculate the CDF of Normal Distribution in a particular value
Code:/*BoostCDF.cpp*/
#include <iostream>
#include <boost/math/distributions/normal.hpp> // for normal_distribution
using boost::math::normal; // typedef provides default type is double.
using namespace std;
int main()
{
double mean = 140.; // sacks per week.
double standard_deviation = 10;
normal sacks(mean, standard_deviation);
double stock = 160.; // per week.
cout << "Percentage of weeks overstocked "
<< cdf(sacks, stock) * 100. << endl; // P(X <=160)
// Percentage of weeks overstocked 97.7
system("pause");
}
24
Part II. C++ with R
Rcpp
25
• What is Rcpp?An R package provides R functions as well as C++ classes which offer a
seamless integration of R and C++.
• What are the advantages compare to R?o Recursive functions, the overhead of calling a function is much lower
than Ro Provided advanced data structures and algorithms that R doesn’t
provideo Loops that can be easily vectorised
26
Install Rcpp in R• Rstudio• Install Rccp package• Install R tools
27
Install Rcpp in R
28
Why Rcpp?A short example • review the previous C++ part • see how to write C++ function in R• Compare the running speed to R
Example: The Fibonacci sequence is defined as a recursive sum of the two preceding terms in the same
sequence: with these two initial conditionsso that the first ten numbers of the sequence to are seen to be 0,1,1,2,3,5,8,13,21,34.
Use R, C++ and Rcpp to write a function to find the Fibonacci sequence with the same method (recursive) first, then compare their running time.
29
Why Rcpp?(example 2.1)• R, the recursive function is:fibR <- function(n) {
if (n == 0) return(0)
if (n == 1) return(1)
return (fibR(n - 1) + fibR(n - 2))
}
• In C++, the code is:#include <iostream>using namespace std;int fibonacci(const int x) { if (x == 0) return(0); if (x == 1) return(1); return (fibonacci(x - 1)+ fibonacci(x - 2)); } int main(){ cout<<"fibonacci 30 is"<<fibonacci(30)<<endl;}
• In Rcpp: cppFunction()cppFunction( 'int fibonacci(const int x) {if (x == 0) return(0); if (x == 1) return(1); return (fibonacci(x - 1)) + fibonacci(x - 2); }')
cpp file#include <Rcpp.h>using namespace Rcpp;// [[Rcpp::export]]int fibonacci2(const int x) { if (x < 2) return x; else return (fibonacci2(x - 1)) + fibonacci2(x - 2);}
30
Why Rcpp?(example 2.1)• Comparing the running time Calculate running time in C++:#include <iostream>#include <ctime>using namespace std;int fibonacci(const int x) { if (x == 0) return(0); if (x == 1) return(1); return (fibonacci(x - 1)+ fibonacci(x - 2)); } int main(){ clock_t start_s, end_s; start_s=clock(); fibonacci(30); end_s=clock(); double cpu_time_used; cpu_time_used = ((double) (end_s - start_s))/CLOCKS_PER_SEC;//in seconds cout<<"fibonacci 30 is"<<fibonacci(30)<<endl; cout << "time: " << cpu_time_used<< endl;}
• In R: benchmark Sys.time() proc.time() system.time()
Code in R Studio:fibR.time<-system.time(fibR(30))
fibonacci2.time<-system.time(fibonacci2(30))
ctime<-c(fibR.time[3],fibonacci2.time[3],0.012)
spdup<-round(ctime[1]/ctime,2)
names(spdup)<-c("Use R","Rcpp","C++")
midpoints<-barplot(spdup,ylab="Speed up")
text(midpoints,c(100,200,800),labels=spdup)
31
Why Rcpp?(example 2.1)
32
• Rules to write a function in Rcpp: You must declare the type of output the function returns. This function returns an int (a scalar
integer). Scalars and vectors are different. The scalar equivalents of numeric, integer, character, and logical
vectors are: double, int, String, and bool. You must use an explicit return statement to return a value from a function. Every statement is terminated by a ;. Include // [[Rcpp::export]]; for each different function in the same Rcpp file.
Useful Variable Types & Control Structures for Rcpp
33
Variable Types•scalar intdoubleboolString
•VectorNumericVectorIntegerVectorCharacterVectorLogicalVector
•matrix IntegerMatrixNumericMatrixLogicalMatrixCharacterMatrix
Variable Types & Loop FunctionsLoop Function•While loopwhile(condition)
{
statement(s);
}
•For loopfor ( init; condition; increment )
{
statement(s);
}
•Do while loopdo
{
statement(s);
}while( condition );
•Nested loopDouble for loops, double do while loops
34
Example 2.2• Write a function to calculate the mean
o For vector, use for loop#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
/*mean for the vector*/
double rowmeanC(NumericVector x) {
int n=x.size();
double total = 0;
for(int i = 0; i < n; ++i) {
total += x[i];
}
return total/n;
}
o For vector, use for loop nested do while loop
// [[Rcpp::export]]
/*row means for a matrix*/
NumericVector rowmeanC2(NumericMatrix x) {
int nrow = x.nrow(), ncol = x.ncol();
NumericVector out(nrow);
/*for (int i = 0; i < nrow; i++) {*/
int i=0;
do{
double total = 0;
for (int j = 0; j < ncol; j++) {
total += x(i, j);
}
out[i] = total;
i++;
}while(i<nrow);
return out/ncol;
}
35
Rcpp Sugar• “Sugar”o syntactic “sugar” to ensure that C++ functions work very similarly to their R equivalents.
• Use sugar functions o It’ll be both expressive and well tested.
Advantage:get faster in the future as more time is spent on optimising Rcpp
Disadvantage:They aren’t always faster than a handwritten equivalent
36
Rcpp Sugar FunctionsA grab bag of sugar functions that mimic frequently used R functions•Math functions: abs(), acos(), asin(), atan(), beta(), ceil(), ceiling(), choose(), cos(), cosh(), digamma(), exp(), expm1(), factorial(), floor(), gamma(), lbeta(), lchoose(), lfactorial(), lgamma(), log(), log10(), log1p(), pentagamma(), psigamma(), round(), signif(), sin(), sinh(), sqrt(), tan(), tanh(), tetragamma(), trigamma(), trunc().
•Scalar summaries:
mean(), min(), max(), sum(), sd(), and (for vectors) var().
•Vector summariescumsum(), diff(), pmin(), and pmax().
•Finding values: match(), self_match(), which_max(), which_min().
•Dealing with duplicates: duplicated(), unique().
•d/q/p/r for all standard distributions.
37
Example 2.3#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export()]]
NumericVector getNTN (int n) {
NumericVector draw(n);
double high = 3 ;
double low = -3;
bool valid = false ;
NumericVector cand1 = rnorm(n, 0, 1) ;
while (!valid) {
for(int i=0;i<n;i++){
if (cand1(i) >= high)
draw(i)=high;
else if(cand1(i)<=low)
draw(i)=low;
else draw(i)=cand1(i);
valid = true ;
}
}
return(draw);
}
38
More About Rcpp?• The package comes with nine pdf vignettes, and numerous help pages.> vignette(package="Rcpp")Vignettes in package ‘Rcpp’:Rcpp-attributes, Rcpp-extending, Rcpp-FAQ, Rcpp-introduction, Rcpp-modules, Rcpp-package, Rcpp-quickref
Rcpp-sugar, Rcpp-unitTests• Rcpp online tutorial (http://adv-r.had.co.nz/Rcpp.html)• Rcpp Gallery (http://gallery.rcpp.org/) • Book
39
Reference• Bjarne Stroustrup (http://www.stroustrup.com/)
• Rcpp online tutorial (http://adv-r.had.co.nz/Rcpp.html)
• Rcpp Gallery (http://gallery.rcpp.org/)
• “Seamless R and C++ Integration with Rcpp” by Dirk Eddelbuettel.
40
Thanks!
Questions?
41