+ - 0:00:00
Notes for current slide
Notes for next slide

Advanced R for Econometricians

Speeding Things Up with Rcpp

Martin C. Arnold, Jens Klenke

1 / 45

Rcpp — FAQs

What is Rcpp?

The Rcpp package allows to implement and easily invoke C++ functions in R by providing a clean and approachable API.

Who should use it?

Anyone who is interested in eliminating bottlenecks which cannot be sped-up in vanilla R.

Is Rcpp a self-contained language?

No. To write code using Rcpp basically means to write C++ code using special C++ classes that allow interfacing with R.

The official quick reference has some code snippets which facilitate the start.

2 / 45

Rcpp — FAQs

Do I need to be fluent in C++?

  • No. Rcpp is easy to use, even without deep knowledge of C++. A basic knowledge of C++ syntax and data types is sufficient for a start.

  • The package allows to write C++ code in a style similar to R — syntactic sugar:

    Rcpp comes with well tested functions which look and behave very similar to their R equivalent. Sugar versions of arithmetic and logical operators, vector views and many base R functions facilitate the first steps in programming with Rcpp tremendously.

  • In what follows we will not spend much time on advanced stuff like OO-C++ programming or templates but focus on writing small, self-contained functions.

    learnCpp is a good source for details on these subjects.

3 / 45

Rcpp — FAQs

When is rewriting code using C++ an attractive alternative?

Typical bottlenecks in R that be addressed using C++ are:

  • Loops that can’t be easily vectorised because subsequent iterations depend on previous ones.

    (e.g., algorithms for Markov chain Monte Carlo in Bayesian models)

  • Recursive functions/problems which involve calling functions a large number of times. The overhead of calling functions in C++ is much lower than that in R.
4 / 45

Rcpp — FAQs

Where to learn details and where to find good examples?

  • The Rcpp package vignette is a good start

  • The Rcpp Gallery provides many comprehensive examples and also applications to statistics and econometrics

  • stackoverflow has answers to numerous questions that arise in the beginning (more than 2800 question tagged [rcpp] answered). The package authors (especially Dirk Eddelbuettel) often contribute to the discussion.

  • Technical details are well explained in Seamless R and C++ integration with Rcpp (Eddelbuettel, 2013)

    (PDF freely available on Springer Link via UDE intranet)

5 / 45

Rcpp — Prerequisites

We need Rcpp from CRAN as well as a working C++ compiler. Installation is OS-dependent for the latter.

## # get the Rcpp package from CRAN
## install.packages("Rcpp")
library(Rcpp)

Installing a C++ compiler

  • Windows: install rtools

  • Mac: install Xcode

  • Linux: install gcc/g++ via sudo apt-get install r-base-dev or similar.

6 / 45

Rcpp — Prerequisites

Some vocabulary

  • A pointer contains the physical address of an data in the memory

  • At the C level, R objects are stored in the SEXP (S expression) data type. An SEXP is essentially a pointer.

So how does it work?

  • We use C++ functions which generate and modify SEXP objects faster than R is able to.

  • Rcpp allows a seamless transfer between C++ and an SEXP objects with low cost by passing pointers (pass by reference) to C++ functions using special classes which enclose SEXPs (we will introduce these in a bit).

7 / 45

Rcpp — C++ Syntax Basics

Example: Syntax in C++ — 1

int one() {
return 1;
}

Note that:

  • We do not use assignment to create functions in C++!

  • We need to declare the type returned by the function (and also for its arguments)

  • Scalars and vectors are different objects! There are also differences w.r.t. scalar types:

    numeric, integer, character and logical (R)

    double, int, string, bool (C++)

  • The return statement must be used to return a value!

  • Every statement must be followed by a ;

8 / 45

Rcpp — C++ Syntax Basics

Example: Syntax in C++ — 2

# R function
isOdd_r <- function(num = 10L) {
result <- (num %% 2L == 1L)
return(result)
}
isOdd_r(42L)
## [1] FALSE
# C++ equivalent
# (not yet executable!)
bool isOdd_cpp(int num = 10) {
bool result = (num % 2 == 1);
return result;
}
  • Assignment is done using the = operator

  • Binary arithmetic and logical operators work as in R. More on this later.

  • Default values for arguments may be specified.

    Important: Default arguments are assigned from right to left. Using a default value requires all subsequent arguments to have default values, too.

9 / 45

Rcpp — Inline Compilation

Example: cppFunction()

Small self-contained functions are easily compiled and linked using cppFunction().

Rcpp::cppFunction('
bool isOdd_cpp(int num = 10) {
bool result = (num % 2 == 1);
return result;
}')

isOdd_cpp() now works like a regular R function.

isOdd_cpp(42L)
## [1] FALSE
10 / 45

Rcpp — Inline Compilation

Example: cppFunction()

What happened?

  • Our function isOdd_cpp() is sourced by Rcpp.

  • Rcpp generates a wrapper function in R. It then compiles and links the C++ function to the wrapper. The function is now available in R under its C++ name.

isOdd_cpp
## function (num = 10L)
## .Call(<pointer: 0x107b5c300>, num)

We see that isOdd_cpp() uses .call() to invoke the C++ version of the function which is stored in the specified pointer.

11 / 45

Rcpp — Sourcing a .cpp-file via RStudio



12 / 45

Rcpp — Sourcing a .cpp-File via RStudio

You will be prompted with the following file:

#include <Rcpp.h>
using namespace Rcpp;
// This is a simple example of exporting a C++ function to R. You can
// source this function into an R session using the Rcpp::sourceCpp
// function (or via the Source button on the editor toolbar).
// [[Rcpp::export]]
NumericVector timesTwo(NumericVector x) {
return x * 2;
}
// You can include R code blocks in C++ files processed with sourceCpp
// (useful for testing and development). The R code will be automatically
// run after the compilation.
//
// /*** R
// timesTwo(42)
// */
13 / 45

Rcpp — Loops

Example: Summation

sum_R <- function(x) {
total <- 0
for (i in seq_along(x)) {
total <- total + x[i]
}
total
}
double sum_C(NumericVector x) {
int n = x.size();
double total = 0;
for(int i = 0; i < n; ++i) {
total += x[i];
}
return total;
}
  • Indices start at 0!

  • C++ methods are called with a full stop, e.g. .size()

  • Syntax in for(): for(initialize; check; increment)

  • In-place modification: total += x[i] is equivalent to total = total[i-1] + x[i]

14 / 45

Rcpp — Loops

Example: Summation — ctd.

x <- runif(1e3)
bench::mark(
sum(x),
sum_C(x),
sum_R(x)
)
## # A tibble: 3 × 6
## expression min median `itr/sec` mem_alloc `gc/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
## 1 sum(x) 1.56µs 1.64µs 582125. 0B 0
## 2 sum_C(x) 1.56µs 1.76µs 460865. 2.49KB 0
## 3 sum_R(x) 14.47µs 15.05µs 65413. 19.74KB 0

sum_C() is competitive with the highly optimised primitive sum(). sum_R() is considerably slower.

15 / 45

Rcpp — Vectors and Matrices

Return type of functions

  • Rcpp automatically wraps C++ base types (like double, int, string, bool) using classes which can be handled by R

  • This is different for more complex structures like vector and matrix. Rcpp provides special classes for each base type which have to be stated explicitly:

    • NumericVector, IntegerVector, CharacterVector and LogicalVector
    • NumericMatrix, IntegerMatrix, CharacterMatrix and LogicalMatrix
  • Using classes from the C++ Standard Template Library (STL) will not pose a problem, but requires a deep copy (pass-by-value) whereas using the appropriate Rcpp class results in a modification of the object (pass-by-reference)

16 / 45

Rcpp — Vectors and Matrices

Example: STL-type mapping

The following code is 'pure' C++ and can be handled by R: Rcpp maps std::vector<type> to NumericVector.

(Note that we use :: to access objects from the std namespace)

std::vector<int> zeros_STL(int N) {
std::vector<int> X(N);
return X;
}

The above requires a copy of X which is avoided if we use IntegerVector or NumericVector instead.

NumericVector zeros_Rcpp(int N) {
NumericVector X(N);
return X;
}
17 / 45

Rcpp — Vectors and Matrices

Example: STL-type mapping — ctd.

We compare both functions after sourcing using cppFunction().

x <- 1e6L
bench::mark(
zeros_Rcpp(x),
zeros_STL(x)
)
##
## # A tibble: 2 x 13
## expression min median `itr/sec` mem_alloc `gc/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
## 1 zeros_Rcpp(x) 729.5µs 1.02ms 307. 7.63MB 5.44
## 2 zeros_STL(x) 1.85ms 3.31ms 225. 3.82MB 2.64
18 / 45

Rcpp — Vectors and Matrices

Sometimes it will be convenient to generate objects using the STL and use wrap() before returning them.

Example: wrap()

// using STL constructor function
NumericVector iota(int N) {
std::vector<int> X(N, 1);
return wrap(X);
}
vs.
// using Rcpp constructor
NumericVector iota(int N) {
NumericVector X(N);
std::fill(X.begin(), X.end(), 1)
return X;
}
19 / 45

Rcpp — Vectors and Matrices

C++ implementations that heavily rely on loops can often compete with R primitives in terms of speed.

Example: rowSumsC()

NumericVector rowSumsC(NumericMatrix x) {
int nrow = x.nrow(), ncol = x.ncol();
NumericVector out(nrow);
for (int i = 0; i < nrow; i++) {
double total = 0;
for (int j = 0; j < ncol; j++) {
total += x(i, j);
}
out[i] = total;
}
return out;
}
  • Note the short-hand definition of variables of the same type.

  • We subset matrices with () instead of [].

20 / 45

Rcpp — Data frames and Lists

Rcpp has classes List and DataFrame which are most interesting for input to and output of functions.

Example: extracting data from lm object

The function below extracts the residuals from a linear model of class lm.

NumericVector residuals(List mod) {
NumericVector resids = as<NumericVector>(mod["residuals"]);
return resids;
}
  • Remember that in R, a data.frame is an object of type list so the above function would also work on a data.frame with named column residuals.

  • Note that manual conversion to NumericVector using as() is not necessary here because residuals are always of the same type.

21 / 45

Rcpp — Data frames and Lists

Example: extract data from lm object — ctd.

We may return selected entries of an lm object in a list

(note that the Rcpp class is List with a capital L)

List components(List mod) {
NumericVector resids = as<NumericVector>(mod["residuals"]);
NumericVector coefs = as<NumericVector>(mod["coefficients"]);
NumericVector fitted = as<NumericVector>(mod["fitted.values"]);
List out = List::create(
Named("Residuals") = resids,
Named("Coefficients") = coefs,
Named("Fitted Values") = fitted
);
return out;
}
22 / 45

Rcpp — Two Paradigms




23 / 45

Rcpp — Two Paradigms

Digression: pass by reference vs. pass by value in C++

In C++ a pass by value happens by default.

(The functions below have return type void, i.e., they do not return anything.)

void test(int A) { // A is passed by value
int B = A; // B is a copy of A
A = 2 * A; // A is altered inside the function
}

Pass by reference is invoked using the address-of operator &.

void test(int& A) { // A is passed by reference, no copy is created
int B = A; // B is a copy of A
A = 2 * A; // A is altered inside AND outside the function
}
24 / 45

Rcpp — Two Paradigms

Digression: pass by reference vs. pass by value in C++

We may protect inputs from being altered inside and outside the function.

void test(const int A) { // A is passed by value
int B = 2 * A; // and cannot be altered
}
void test(const int& A) { // A is passed by reference
int B = 2 * A; // and cannot be altered
}

Note that it is admissible to use the address-of operator & in variable assignment.

int& B = A; // B is a reference to A
25 / 45

Rcpp — Two Paradigms

Digression: pass by reference vs. pass by value in Rcpp

The behavior is somewhat different in Rcpp.

#include<Rcpp.h>
// [[Rcpp::export]]
void test_reference(NumericVector A) {
NumericVector B = A;
Rcout << "Before: " << std::endl <<
"A: " << A << std::endl <<
"B: " << B << std::endl;
A[1] = 0.5;
Rcout << "After: " << std::endl <<
"A: " << A << std::endl <<
"B: " << B << std::endl;
}
26 / 45

Rcpp — Two Paradigms

Digression: pass by reference vs. pass by value in Rcpp

Rcpp does not copy an object when passed as an argument to a function or on assignment!

x <- 1:3/10
test_reference(x)
## Before:
## A: 0.1 0.2 0.3
## B: 0.1 0.2 0.3
## After:
## A: 0.1 0.5 0.3
## B: 0.1 0.5 0.3

test_reference() alters the values of A and B as expected. But what happened to x?

27 / 45

Rcpp — Two Paradigms

Digression: pass by reference vs. pass by value in Rcpp

We find that x was modified outside the function.

print(x) # check that x was modified
## [1] 0.1 0.5 0.3

What happened?

  • The value passed to A in test_reference() is essentially the address pointing to x in the memory. The function works on x by reference.

  • Also the line NumericVector B = A; does not create a copy of A but an object pointing to data at the same memory location. Altering the value of A thus alters the value of B, too!

28 / 45

Rcpp — Two Paradigms

Digression: pass by reference vs. pass by value in Rcpp

Using clone() we may bypass the reference on assignment and work with the copy B.

#include<Rcpp.h>
// [[Rcpp::export]]
void test_clone(NumericVector A) {
NumericVector B = clone(A);
Rcout << "Before: " << std::endl <<
"A: " << A << std::endl <<
"B: " << B << std::endl;
B[1] = 0.5;
Rcout << "After: " << std::endl <<
"A: " << A << std::endl <<
"B: " << B << std::endl;
}
29 / 45

Rcpp — Two Paradigms

Digression: pass by reference vs. pass by value in Rcpp

x <- 1:3/10
test_clone(x)
## Before:
## A: 0.1 0.2 0.3
## B: 0.1 0.2 0.3
## After:
## A: 0.1 0.2 0.3
## B: 0.1 0.5 0.3
print(x) # check that x was not modified
## [1] 0.1 0.2 0.3
30 / 45

Rcpp — Exercises Part 1

Find out why the following code gives a compile error:

#include<Rcpp.h>
// [[Rcpp::plugins("cpp11")]]
NumericVector x{1, 2, 3, 4, 5};
IntegerVector id{1};
double y = x[id]; // produces compile error
31 / 45

Rcpp — Exercises Part 1

Benchmark the functions below against each other for x<-rnorm(1e2), x<-rnorm(1e4) and x<-rnorm(1e6). Comment on the results.

NumericVector test_clone_return(NumericVector A) {
NumericVector B = clone(A);
B[1] = 0.5;
return B;
}
NumericVector test_reference_return(NumericVector A) {
A[1] = 0.5;
return A;
}
32 / 45

Rcpp — Syntactic Sugar

Another benefit of using Rcpp classes is Rcpp sugar — more efficient high-level syntax similar that is similar to R. We give simple examples for the most important features.

See the (unofficial) list of sugar functions for more examples.

Binary arithmetic, logical and unary operators

NumericVector x, y;
//arithmetic
NumericVector res = x / y;
NumericVector res = 2.0 - x;
NumericVector res = x / ( y * y );
//comparison
LogicalVector res = x <= y;
LogicalVector res = x < 2;
LogicalVector res = ( x + y ) < ( x*x );
//negation
NumericVector res = -x;
LogicalVector res = ! ( y < z );
33 / 45

Rcpp — Syntactic Sugar

Logical summary functions

// at least one element
is_true( any(x <= y) );
is_false( any(x <= y) );
// all elements
is_true( all(x <= y) );
is_false( all(x <= y) );
// all the above return bool
bool res = is_false( all(x <= y) );
  • There is no recycling: vectors need to have equal lengths, otherwise the compiler will return an error.

  • Note that the conversion to bool using is_true() or is_false() is mandatory. Omission will result in a compiler error.

34 / 45

Rcpp — Syntactic Sugar

Vector views

Rcpp provides functions which give vector 'views' more efficiently than their R equivalents since they use pointers instead of copies.

head(x)
tail(x)
rep_each(x, 10)
rep_len(x, 3)
rev(x)
seq_along(x)
seq_len(10)
35 / 45

Rcpp — Syntactic Sugar

Mathematics and statistics

Most math operations, scalar and vector summaries, and search algorithms have performant sugar equivalents.

// some examples of mathematical sugar implementations
abs(), ceil(), ceiling(), choose(), exp(), factorial(), floor(),
log(), sin(), sinh(), sqrt()
// some examples of scalar and vector summary functions
mean(), min(), max(), sum(), sd(), var(), cumsum(), diff(),
pmin(), pmax()
// finding values
match(), self_match(), which_max(), which_min()

Very convenient: the d/p/q/r functions for standard distributions are implemented, too.

NumericVector x = rnorm(100);
36 / 45

Rcpp — Standard Template Library (STL)

The C++ STL is an extensive collection of generic data structures, algorithms and iterators. We will discuss simple examples of some components.

The cplusplus reference provides a comprehensive summary of the available libraries.

The following collections are recommended for beginners to start with:

  • numeric — operations on sequences of numeric values

  • algorithm — functions which work on ranges of elements using iterators

The following examples show .cpp-files as some require a different header than included by default

37 / 45

Rcpp — STL Iterators

Iterator objects come with operators that allow to iterate over a range elements in in a more abstract way than a classic for() loop does with an indexing variable.

Example: NumericVector::iterator

#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double sum_it(NumericVector x) {
double total = 0;
NumericVector::iterator it;
for(it = x.begin(); it != x.end(); it++) {
total += *it;
}
return total;
}
  • Each Rcpp vector type has a separate iterator

  • Note that we use the dereference operator *it to get the value of the current element

38 / 45

Rcpp — STL Iterators

There are functions which accept iterators as arguments.

Example: apply()-style functions

#include <numeric>
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double sum_accu(NumericVector x) {
return std::accumulate(x.begin(), x.end(), 0.0);
}
  • accumulate() is in the numeric STL so we have to include it in the header

  • Setting the starting value 0.0 ensures that a double is returned (use 0 for an integer)

39 / 45

Rcpp — STL Algorithms

A very useful STL algorithm is transform() which applies an operation sequentially to elements in (up to two) ranges and stores the result in another range.

Example: std::transform()

We may use transform() to square all elements of a vector.

#include <Rcpp.h>
using namespace Rcpp;
double f(double x) {
return std::pow(x, 2);
}
// [[Rcpp::export]]
std::vector<double> square(std::vector<double> x) {
std::transform(x.begin(), x.end(), x.begin(), f);
return x;
}

Note that we do not need to export f() because it is only a subroutine of square()

40 / 45

Rcpp — STL Algorithms

Example: (partial) sort

The following Rcpp function returns a copy of the input vector with the first n elements sorted.

// [[Rcpp::export]]
NumericVector nth_partial_sort(NumericVector x, int nth) {
NumericVector y = clone(x);
std::nth_element(y.begin(), y.begin()+nth, y.end());
std::sort(y.begin(), y.begin()+nth);
return y;
}
41 / 45

Rcpp — STL Algorithms

Example: (partial) sort

Let's benchmark nth_partial_sort() against R's sort() for a full sort.

x <- rnorm(100)
bench::mark(
nth_partial_sort(x, 100),
sort(x),
relative = T)
## # A tibble: 2 × 6
## expression min median `itr/sec` mem_alloc `gc/sec`
## <bch:expr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 nth_partial_sort(x, 100) 1 1 10.5 2.62 10.5
## 2 sort(x) 12.9 11.8 1 1 1
42 / 45

Rcpp — Programming Strategy

  • Write most of your code in R to take advantage of its high abstraction level

  • Profile your code, identify bottlenecks and replace those sections (by writing entire functions) with C++ for performance improvement.

  • Make sure that your C++ routines have the same call signature as their slow R equivalents.

43 / 45

Rcpp — Exercises Part 2

  1. Implement the R function below using Rcpp for integer input (and output) and benchmark both versions.

    f <- function(n) {
    if(n < 3) return(n)
    return(f(n-1) + f(n-2))
    }
  2. Reconsider the function nth_partial_sort(). Compare its performance with that of R's sort() for large vectors. Explain your findings.

  3. Convert all(), range() and var() into C++. You may assume that the inputs have no missing values.

44 / 45

Thank You!

45 / 45

Rcpp — FAQs

What is Rcpp?

The Rcpp package allows to implement and easily invoke C++ functions in R by providing a clean and approachable API.

Who should use it?

Anyone who is interested in eliminating bottlenecks which cannot be sped-up in vanilla R.

Is Rcpp a self-contained language?

No. To write code using Rcpp basically means to write C++ code using special C++ classes that allow interfacing with R.

The official quick reference has some code snippets which facilitate the start.

2 / 45
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow