Rcpp
Rcpp
— FAQsWhat is Rcpp
?
The Rcpp
package allows to implement and easily invoke C++
functions in R
by providing a clean and approachable API.
Who should use it?
Anyone who is interested in eliminating bottlenecks which cannot be sped-up in vanilla R
.
Is Rcpp
a self-contained language?
No. To write code using Rcpp
basically means to write C++
code using special C++
classes that allow interfacing with R
.
The official quick reference has some code snippets which facilitate the start.
Rcpp
— FAQsDo I need to be fluent in C++
?
No. Rcpp
is easy to use, even without deep knowledge of C++
. A basic knowledge of C++
syntax and data types is sufficient for a start.
The package allows to write C++ code in a style similar to R — syntactic sugar:
Rcpp
comes with well tested functions which look and behave very similar to their R equivalent. Sugar versions of arithmetic and logical operators, vector views and many base R functions facilitate the first steps in programming with Rcpp
tremendously.
In what follows we will not spend much time on advanced stuff like OO-C++
programming or templates but focus on writing small, self-contained functions.
learnCpp is a good source for details on these subjects.
Rcpp
— FAQsWhen is rewriting code using C++
an attractive alternative?
Typical bottlenecks in R
that be addressed using C++
are:
Loops that can’t be easily vectorised because subsequent iterations depend on previous ones.
(e.g., algorithms for Markov chain Monte Carlo in Bayesian models)
C++
is much lower than that in R
.Rcpp
— FAQsWhere to learn details and where to find good examples?
The Rcpp
package vignette is a good start
The Rcpp Gallery provides many comprehensive examples and also applications to statistics and econometrics
stackoverflow has answers to numerous questions that arise in the beginning (more than 2800 question tagged [rcpp] answered). The package authors (especially Dirk Eddelbuettel) often contribute to the discussion.
Technical details are well explained in Seamless R and C++ integration with Rcpp (Eddelbuettel, 2013)
(PDF freely available on Springer Link via UDE intranet)
Rcpp
— PrerequisitesWe need Rcpp
from CRAN as well as a working C++
compiler. Installation is OS-dependent for the latter.
## # get the Rcpp package from CRAN## install.packages("Rcpp")library(Rcpp)
Installing a C++
compiler
Windows: install rtools
Mac: install Xcode
Linux: install gcc/g++
via sudo apt-get install r-base-dev or similar.
Rcpp
— PrerequisitesSome vocabulary
A pointer contains the physical address of an data in the memory
At the C
level, R
objects are stored in the SEXP (S expression) data type. An SEXP is essentially a pointer.
So how does it work?
We use C++
functions which generate and modify SEXP objects faster than R
is able to.
Rcpp
allows a seamless transfer between C++
and an SEXP objects with low cost by passing pointers (pass by reference) to C++
functions using special classes which enclose SEXPs (we will introduce these in a bit).
Rcpp
— C++ Syntax BasicsExample: Syntax in C++
— 1
int one() { return 1;}
Note that:
We do not use assignment to create functions in C++
!
We need to declare the type returned by the function (and also for its arguments)
Scalars and vectors are different objects! There are also differences w.r.t. scalar types:
numeric
, integer
, character
and logical
(R
)
double, int, string, bool (C++
)
The return statement must be used to return a value!
Every statement must be followed by a ;
Rcpp
— C++ Syntax BasicsExample: Syntax in C++
— 2
# R functionisOdd_r <- function(num = 10L) { result <- (num %% 2L == 1L) return(result)}isOdd_r(42L)
## [1] FALSE
# C++ equivalent # (not yet executable!)bool isOdd_cpp(int num = 10) { bool result = (num % 2 == 1); return result;}
Assignment is done using the = operator
Binary arithmetic and logical operators work as in R
. More on this later.
Default values for arguments may be specified.
Important: Default arguments are assigned from right to left. Using a default value requires all subsequent arguments to have default values, too.
Rcpp
— Inline CompilationExample: cppFunction()
Small self-contained functions are easily compiled and linked using cppFunction()
.
Rcpp::cppFunction('bool isOdd_cpp(int num = 10) { bool result = (num % 2 == 1); return result;}')
isOdd_cpp()
now works like a regular R
function.
isOdd_cpp(42L)
## [1] FALSE
Rcpp
— Inline CompilationExample: cppFunction()
What happened?
Our function isOdd_cpp()
is sourced by Rcpp
.
Rcpp
generates a wrapper function in R
. It then compiles and links the C++
function to the wrapper. The function is now available in R
under its C++
name.
isOdd_cpp
## function (num = 10L) ## .Call(<pointer: 0x107b5c300>, num)
We see that isOdd_cpp()
uses .call()
to invoke the C++
version of the function which is stored in the specified pointer.
Rcpp
— Sourcing a .cpp
-file via RStudio
Rcpp
— Sourcing a .cpp
-File via RStudioYou will be prompted with the following file:
#include <Rcpp.h>using namespace Rcpp;// This is a simple example of exporting a C++ function to R. You can// source this function into an R session using the Rcpp::sourceCpp // function (or via the Source button on the editor toolbar). // [[Rcpp::export]]NumericVector timesTwo(NumericVector x) { return x * 2;}// You can include R code blocks in C++ files processed with sourceCpp// (useful for testing and development). The R code will be automatically // run after the compilation.//// /*** R// timesTwo(42)// */
Rcpp
— LoopsExample: Summation
sum_R <- function(x) { total <- 0 for (i in seq_along(x)) { total <- total + x[i] } total}
double sum_C(NumericVector x) { int n = x.size(); double total = 0; for(int i = 0; i < n; ++i) { total += x[i]; } return total;}
Indices start at 0!
C++ methods are called with a full stop, e.g. .size()
Syntax in for()
: for(initialize; check; increment)
In-place modification: total += x[i]
is equivalent to
total = total[i-1] + x[i]
Rcpp
— LoopsExample: Summation — ctd.
x <- runif(1e3)bench::mark( sum(x), sum_C(x), sum_R(x))
## # A tibble: 3 × 6## expression min median `itr/sec` mem_alloc `gc/sec`## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>## 1 sum(x) 1.56µs 1.64µs 582125. 0B 0## 2 sum_C(x) 1.56µs 1.76µs 460865. 2.49KB 0## 3 sum_R(x) 14.47µs 15.05µs 65413. 19.74KB 0
sum_C()
is competitive with the highly optimised primitive sum()
. sum_R()
is considerably slower.
Rcpp
— Vectors and MatricesReturn type of functions
Rcpp
automatically wraps C++
base types (like double
, int
, string
, bool
) using classes which can be handled by R
This is different for more complex structures like vector and matrix. Rcpp
provides special classes for each base type which have to be stated explicitly:
Using classes from the C++ Standard Template Library (STL) will not pose a problem, but requires a deep copy (pass-by-value) whereas using the appropriate Rcpp
class results in a modification of the object (pass-by-reference)
Rcpp
— Vectors and MatricesExample: STL-type mapping
The following code is 'pure' C++
and can be handled by R
: Rcpp
maps std::vector<type>
to NumericVector
.
(Note that we use ::
to access objects from the std
namespace)
std::vector<int> zeros_STL(int N) { std::vector<int> X(N); return X;}
The above requires a copy of X
which is avoided if we use IntegerVector or NumericVector instead.
NumericVector zeros_Rcpp(int N) { NumericVector X(N); return X;}
Rcpp
— Vectors and MatricesExample: STL-type mapping — ctd.
We compare both functions after sourcing using cppFunction()
.
x <- 1e6Lbench::mark( zeros_Rcpp(x), zeros_STL(x))
## ## # A tibble: 2 x 13## expression min median `itr/sec` mem_alloc `gc/sec` ## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> ## 1 zeros_Rcpp(x) 729.5µs 1.02ms 307. 7.63MB 5.44## 2 zeros_STL(x) 1.85ms 3.31ms 225. 3.82MB 2.64
Rcpp
— Vectors and MatricesSometimes it will be convenient to generate objects using the STL and use wrap()
before returning them.
Example: wrap()
// using STL constructor functionNumericVector iota(int N) { std::vector<int> X(N, 1); return wrap(X);}
// using Rcpp constructorNumericVector iota(int N) { NumericVector X(N); std::fill(X.begin(), X.end(), 1) return X;}
Rcpp
— Vectors and MatricesC++
implementations that heavily rely on loops can often compete with R
primitives in terms of speed.
Example: rowSumsC()
NumericVector rowSumsC(NumericMatrix x) { int nrow = x.nrow(), ncol = x.ncol(); NumericVector out(nrow); for (int i = 0; i < nrow; i++) { double total = 0; for (int j = 0; j < ncol; j++) { total += x(i, j); } out[i] = total; } return out;}
Note the short-hand definition of variables of the same type.
We subset matrices with () instead of [].
Rcpp
— Data frames and ListsRcpp
has classes List
and DataFrame
which are most interesting for input to and output of functions.
Example: extracting data from lm
object
The function below extracts the residuals from a linear model of class lm
.
NumericVector residuals(List mod) { NumericVector resids = as<NumericVector>(mod["residuals"]); return resids;}
Remember that in R, a data.frame
is an object of type list
so the above function would also work on a data.frame
with named column residuals
.
Note that manual conversion to NumericVector
using as()
is not necessary here because residuals
are always of the same type.
Rcpp
— Data frames and ListsExample: extract data from lm
object — ctd.
We may return selected entries of an lm
object in a list
(note that the Rcpp
class is List
with a capital L
)
List components(List mod) { NumericVector resids = as<NumericVector>(mod["residuals"]); NumericVector coefs = as<NumericVector>(mod["coefficients"]); NumericVector fitted = as<NumericVector>(mod["fitted.values"]); List out = List::create( Named("Residuals") = resids, Named("Coefficients") = coefs, Named("Fitted Values") = fitted ); return out;}
Rcpp
— Two Paradigms
Rcpp
— Two ParadigmsDigression: pass by reference vs. pass by value in C++
In C++
a pass by value happens by default.
(The functions below have return type void
, i.e., they do not return anything.)
void test(int A) { // A is passed by value int B = A; // B is a copy of A A = 2 * A; // A is altered inside the function}
Pass by reference is invoked using the address-of operator &
.
void test(int& A) { // A is passed by reference, no copy is created int B = A; // B is a copy of A A = 2 * A; // A is altered inside AND outside the function}
Rcpp
— Two ParadigmsDigression: pass by reference vs. pass by value in C++
We may protect inputs from being altered inside and outside the function.
void test(const int A) { // A is passed by value int B = 2 * A; // and cannot be altered}
void test(const int& A) { // A is passed by reference int B = 2 * A; // and cannot be altered}
Note that it is admissible to use the address-of operator &
in variable assignment.
int& B = A; // B is a reference to A
Rcpp
— Two ParadigmsDigression: pass by reference vs. pass by value in Rcpp
The behavior is somewhat different in Rcpp
.
#include<Rcpp.h>// [[Rcpp::export]]void test_reference(NumericVector A) { NumericVector B = A; Rcout << "Before: " << std::endl << "A: " << A << std::endl << "B: " << B << std::endl; A[1] = 0.5; Rcout << "After: " << std::endl << "A: " << A << std::endl << "B: " << B << std::endl; }
Rcpp
— Two ParadigmsDigression: pass by reference vs. pass by value in Rcpp
Rcpp
does not copy an object when passed as an argument to a function or on assignment!
x <- 1:3/10test_reference(x)
## Before: ## A: 0.1 0.2 0.3## B: 0.1 0.2 0.3## After: ## A: 0.1 0.5 0.3## B: 0.1 0.5 0.3
test_reference()
alters the values of A
and B
as expected. But what happened to x
?
Rcpp
— Two ParadigmsDigression: pass by reference vs. pass by value in Rcpp
We find that x
was modified outside the function.
print(x) # check that x was modified
## [1] 0.1 0.5 0.3
What happened?
The value passed to A
in test_reference()
is essentially the address pointing to x
in the memory. The function works on x
by reference.
Also the line NumericVector B = A;
does not create a copy of A
but an object pointing to data at the same memory location. Altering the value of A
thus alters the value of B
, too!
Rcpp
— Two ParadigmsDigression: pass by reference vs. pass by value in Rcpp
Using clone()
we may bypass the reference on assignment and work with the copy B
.
#include<Rcpp.h>// [[Rcpp::export]]void test_clone(NumericVector A) { NumericVector B = clone(A); Rcout << "Before: " << std::endl << "A: " << A << std::endl << "B: " << B << std::endl; B[1] = 0.5; Rcout << "After: " << std::endl << "A: " << A << std::endl << "B: " << B << std::endl; }
Rcpp
— Two ParadigmsDigression: pass by reference vs. pass by value in Rcpp
x <- 1:3/10test_clone(x)
## Before: ## A: 0.1 0.2 0.3## B: 0.1 0.2 0.3## After: ## A: 0.1 0.2 0.3## B: 0.1 0.5 0.3
print(x) # check that x was not modified
## [1] 0.1 0.2 0.3
Rcpp
— Exercises Part 1Find out why the following code gives a compile error:
#include<Rcpp.h>// [[Rcpp::plugins("cpp11")]]NumericVector x{1, 2, 3, 4, 5};IntegerVector id{1};double y = x[id]; // produces compile error
Rcpp
— Exercises Part 1Benchmark the functions below against each other for x<-rnorm(1e2)
, x<-rnorm(1e4)
and x<-rnorm(1e6)
. Comment on the results.
NumericVector test_clone_return(NumericVector A) { NumericVector B = clone(A); B[1] = 0.5; return B;}
NumericVector test_reference_return(NumericVector A) { A[1] = 0.5; return A;}
Rcpp
— Syntactic SugarAnother benefit of using Rcpp
classes is Rcpp sugar — more efficient high-level syntax similar that is similar to R
. We give simple examples for the most important features.
See the (unofficial) list of sugar functions for more examples.
Binary arithmetic, logical and unary operators
NumericVector x, y;//arithmeticNumericVector res = x / y;NumericVector res = 2.0 - x;NumericVector res = x / ( y * y );//comparisonLogicalVector res = x <= y;LogicalVector res = x < 2;LogicalVector res = ( x + y ) < ( x*x );//negationNumericVector res = -x;LogicalVector res = ! ( y < z );
Rcpp
— Syntactic SugarLogical summary functions
// at least one elementis_true( any(x <= y) );is_false( any(x <= y) );// all elementsis_true( all(x <= y) );is_false( all(x <= y) );// all the above return boolbool res = is_false( all(x <= y) );
There is no recycling: vectors need to have equal lengths, otherwise the compiler will return an error.
Note that the conversion to bool using is_true() or is_false() is mandatory. Omission will result in a compiler error.
Rcpp
— Syntactic SugarVector views
Rcpp
provides functions which give vector 'views' more efficiently than their R
equivalents since they use pointers instead of copies.
head(x)tail(x)rep_each(x, 10)rep_len(x, 3)rev(x)seq_along(x)seq_len(10)
Rcpp
— Syntactic SugarMathematics and statistics
Most math operations, scalar and vector summaries, and search algorithms have performant sugar equivalents.
// some examples of mathematical sugar implementationsabs(), ceil(), ceiling(), choose(), exp(), factorial(), floor(), log(), sin(), sinh(), sqrt()
// some examples of scalar and vector summary functionsmean(), min(), max(), sum(), sd(), var(), cumsum(), diff(), pmin(), pmax()
// finding valuesmatch(), self_match(), which_max(), which_min()
Very convenient: the d/p/q/r functions for standard distributions are implemented, too.
NumericVector x = rnorm(100);
Rcpp
— Standard Template Library (STL)The C++ STL is an extensive collection of generic data structures, algorithms and iterators. We will discuss simple examples of some components.
The cplusplus reference provides a comprehensive summary of the available libraries.
The following collections are recommended for beginners to start with:
numeric — operations on sequences of numeric values
algorithm — functions which work on ranges of elements using iterators
The following examples show .cpp-files as some require a different header than included by default
Rcpp
— STL IteratorsIterator objects come with operators that allow to iterate over a range elements in in a more abstract way than a classic for()
loop does with an indexing variable.
Example: NumericVector::iterator
#include <Rcpp.h>using namespace Rcpp;// [[Rcpp::export]]double sum_it(NumericVector x) { double total = 0; NumericVector::iterator it; for(it = x.begin(); it != x.end(); it++) { total += *it; } return total;}
Each Rcpp vector type has a separate iterator
Note that we use the dereference operator *it
to get the value of the current element
Rcpp
— STL IteratorsThere are functions which accept iterators as arguments.
Example: apply()
-style functions
#include <numeric>#include <Rcpp.h>using namespace Rcpp;// [[Rcpp::export]]double sum_accu(NumericVector x) { return std::accumulate(x.begin(), x.end(), 0.0);}
accumulate()
is in the numeric STL so we have to include it in the header
Setting the starting value 0.0
ensures that a double
is returned (use 0
for an integer
)
Rcpp
— STL AlgorithmsA very useful STL algorithm is transform()
which applies an operation sequentially to elements in (up to two) ranges and stores the result in another range.
Example: std::transform()
We may use transform()
to square all elements of a vector.
#include <Rcpp.h>using namespace Rcpp;double f(double x) { return std::pow(x, 2); }// [[Rcpp::export]]std::vector<double> square(std::vector<double> x) { std::transform(x.begin(), x.end(), x.begin(), f);return x;}
Note that we do not need to export f()
because it is only a subroutine of square()
Rcpp
— STL AlgorithmsExample: (partial) sort
The following Rcpp
function returns a copy of the input vector with the first n
elements sorted.
// [[Rcpp::export]]NumericVector nth_partial_sort(NumericVector x, int nth) { NumericVector y = clone(x); std::nth_element(y.begin(), y.begin()+nth, y.end()); std::sort(y.begin(), y.begin()+nth); return y;}
Rcpp
— STL AlgorithmsExample: (partial) sort
Let's benchmark nth_partial_sort()
against R's sort()
for a full sort.
x <- rnorm(100)bench::mark( nth_partial_sort(x, 100), sort(x), relative = T)
## # A tibble: 2 × 6## expression min median `itr/sec` mem_alloc `gc/sec`## <bch:expr> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 nth_partial_sort(x, 100) 1 1 10.5 2.62 10.5## 2 sort(x) 12.9 11.8 1 1 1
Rcpp
— Programming StrategyWrite most of your code in R
to take advantage of its high abstraction level
Profile your code, identify bottlenecks and replace those
sections (by writing entire functions) with C++
for performance improvement.
Make sure that your C++
routines have the same call signature as their slow R
equivalents.
Rcpp
— Exercises Part 2Implement the R
function below using Rcpp
for integer input (and output) and benchmark both versions.
f <- function(n) { if(n < 3) return(n) return(f(n-1) + f(n-2))}
Reconsider the function nth_partial_sort()
. Compare its performance with that of R
's sort()
for large vectors. Explain your findings.
Convert all()
, range()
and var()
into C++
. You may assume that the inputs have no missing values.
Rcpp
— FAQsWhat is Rcpp
?
The Rcpp
package allows to implement and easily invoke C++
functions in R
by providing a clean and approachable API.
Who should use it?
Anyone who is interested in eliminating bottlenecks which cannot be sped-up in vanilla R
.
Is Rcpp
a self-contained language?
No. To write code using Rcpp
basically means to write C++
code using special C++
classes that allow interfacing with R
.
The official quick reference has some code snippets which facilitate the start.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |