Extending R with C++:
A Brief Introduction to Rcpp
Dirk Eddelbuettela and James Joseph Balamutab
a
Debian and R Projects; Chicago, IL, USA; edd@debian.org; b Depts of Informatics and Statistics, Univ. of Illinois at Urbana-Champaign; Champaign, IL, USA; balamut2@illinois.edu
This version was compiled on January 11, 2022
R has always provided an application programming interface (API) for Chambers (2016, p. 4) builds and expands on this theme. Two
extensions. Based on the C language, it uses a number of macros and core facets of what “makes” R are carried over from the previous
other low-level constructs to exchange data structures between the R book. The first states what R is composed of: Everything that exists
process and any dynamically-loaded component modules authors added in R is an object. The second states how these objects are created
to it. With the introduction of the Rcpp package, and its later refinements, or altered: Everything that happens in R is a function call. A third
this process has become considerably easier yet also more robust. By statement is now added: Interfaces to other software are part of R.
now, Rcpp has become the most popular extension mechanism for R. This This last addition is profound. If and when suitable and per-
article introduces Rcpp, and illustrates with several examples how the formant software for a task exists, it is in fact desirable to have
Rcpp Attributes mechanism in particular eases the transition of objects a (preferably also perfomant) interface to this software from R.
between R and C++ code. Chambers (2016) discusses several possible approaches for simpler
interfaces and illustrates them with reference implementations to
applications and case studies | statistical computing | computationally intensive
both Python and Julia. However, the most performant interface for
methods | simulation
R is provided at the subroutine level, and rather than discussing
the older C interface for R, Chambers (2016) prefers to discuss
Introduction Rcpp. This article follows the same school of thought and aims to
introduce Rcpp to analysts and data scientists, aiming to enable
The R language and environment (R Core Team, 2021a) has es- them to use—and create— further interfaces for R which aid the
tablished itself as both an increasingly dominant facility for data mission while staying true to the prime directive. Adding interfaces
analysis, and the lingua franca for statistical computing in both in such a way is in fact a natural progression from the earliest de-
research and application settings. signs for its predecessor S which was after all designed to provide
Since the beginning, and as we argue below, “by design”, the a more useable ‘interface’ to underlying routines in Fortran.
R system has always provided an application programming in- The rest of the paper is structured as follows. We start by dis-
terface (API) suitable for extending R with code written in C or cussing possible first steps, chiefly to validate correct installations.
Fortran. Being implemented chiefly in R and C (with a generous This is followed by an introduction to simple C++ functions, com-
sprinkling of Fortran for well-established numerical subroutines), parison to the C API, a discussion of packaging with Rcpp and
R has always been extensible via a C interface. Both the actual a linear algebra example. The appendix contains some empirical
implementation and the C interface use a number of macros and illustrations of the adoption of Rcpp.
other low-level constructs to exchange data structures between
the R process and any dynamically-loaded component modules
authors added to it. First Steps with Rcpp
A C interface will generally also be accessible to other languages. Rcpp is a CRAN package and can be installed by using
Particularly noteworthy here is the C++ language, developed orig- install.packages('Rcpp') just like any other R package. On
inally as a ‘better C’, which is by its design very interoperable with some operating systems this will download pre-compiled binary
C. And with the introduction of the Rcpp package (Eddelbuettel packages; on others an installation from source will be attempted.
and François, 2011; Eddelbuettel, 2013; Eddelbuettel et al., 2022), But Rcpp is a little different from many standard R packages in
and its later refinements, this process of extending R has become one important aspect: it helps the user to write C(++) programs
considerably easier yet also more robust. To date, Rcpp has become more easily. The key aspect to note here is C++ programs: to
the most popular extension system for R. This article introduces operate, Rcpp needs not only R but also an additional toolchain
Rcpp, and illustrates with several examples how the Rcpp Attributes of a compiler, linker and more in order to be able to create binary
mechanism (Allaire et al., 2022) in particular eases the transition object code extending R.
of objects between R and C++ code. We note that this requirement is no different from what is
needed with base R when compilation of extensions is attempted.
Background. Chambers (2008, p. 3) provides a very thorough dis- How to achieve this using only base R is described in some detail
cussion of desirable traits for a system designed to program with in the Writing R Extensions manual (R Core Team, 2021b) that
data, and the R system in particular. Two key themes motivate is included with R. As for the toolchain requirements, on Linux
the introductory discussion. First, the Mission is to aid exploration and macOS, all required components are likely to be present. The
in order to provide the best platform to analyse data: “to boldly macOS can offer additional challenges as toolchain elements can
go where no one has gone before.” Second, the Prime Directive be obtained in different ways. Some of these are addressed in the
is that the software systems we build must be trustworthy: “the Rcpp FAQ (Eddelbuettel and François, 2022a) in sections 2.10 and
many computational steps between original data source and dis- 2.16. On Windows, users will have to install the Rtools kit provided
played result must all be trustful.” The remainder of the book then by R Core available at https://cran.r-project.org/bin/windows/Rtools/.
discusses R, leading to two final chapters on interfaces. Details of these installation steps are beyond the scope of this pa-
https://cran.r-project.org/package=Rcpp Rcpp Vignette | January 11, 2022 | 1–8
per. However, many external resources exist that provide detailed similarly to the earlier example involving evalCpp(), the C++
installation guides for R toolschains in Windows and macOS. code is both compiled and linked, and then imported into R under
As a first step, and chiefly to establish that the toolchain is set the name of the function supplied (e.g. here isOddCpp()).
up correctly, consider a minimal use case such as the following:
library("Rcpp")
library("Rcpp") cppFunction("
evalCpp("2 + 2") bool isOddCpp(int num = 10) {
# [1] 4 bool result = (num % 2 == 1);
return result;
Here the Rcpp package is loaded first via the library() func- }")
tion. Next, we deploy one of its simplest functions, evalCpp(), isOddCpp(42L)
which is described in the Rcpp Attributes vignette (Allaire et al., # [1] FALSE
2022). It takes the first (and often only) argument—a character
object—and evaluates it as a minimal C++ expression. The value
assignment and return are implicit, as is the addition of a trailing Extending R via its C API
semicolon and more. In fact, evalCpp() surrounds the expression Let us first consider the case of ‘standard R’, i.e. the API as defined
with the required ‘glue’ to make it a minimal source file which in the core R documentation. Extending R with routines written
can be compiled, linked and loaded. The exact details behind this using the C language requires the use of internal macros and
process are available in-depth when the verbose option of the functions documented in Chapter 5 of Writing R Extensions (R Core
function is set. If everything is set up correctly, the newly-created Team, 2021b).
R function will be returned.
While such a simple expression is not interesting in itself, it #include <R.h>
serves a useful purpose here to unequivocally establish whether #include <Rinternals.h>
Rcpp is correctly set up. Having accomplished that, we can proceed
to the next step of creating simple functions. SEXP convolve2(SEXP a, SEXP b) {
int na, nb, nab;
A first C++ function using Rcpp double *xa, *xb, *xab;
SEXP ab;
As a first example, consider the determination of whether a number
is odd or even. The default practice is to use modular arithmetic a = PROTECT(coerceVector(a, REALSXP));
to check if a remainder exists under x mod 2. Within R, this can b = PROTECT(coerceVector(b, REALSXP));
be implemented as follows: na = length(a); nb = length(b);
isOddR <- function(num = 10L) { nab = na + nb - 1;
result <- (num %% 2L == 1L) ab = PROTECT(allocVector(REALSXP, nab));
return(result) xa = REAL(a); xb = REAL(b); xab = REAL(ab);
} for (int i = 0; i < nab; i++)
isOddR(42L) xab[i] = 0.0;
# [1] FALSE for (int i = 0; i < na; i++)
for (int j = 0; j < nb; j++)
The operator %% implements the mod operation in R. For the xab[i + j] += xa[i] * xb[j];
default (integer) argument of ten used in the example, 10 mod 2 UNPROTECT(3);
results in zero, which is then mapped to FALSE in the context of a return ab;
logical expression. }
Translating this implementation into C++, several small details
have to be considered. First and foremost, as C++ is a statically- This function computes a convolution of two vectorsPsupplied
typed language, there needs to be additional (compile-time) infor- on input, a and b, which is defined to be abk+1 = ai · b j .
i+ j==k
mation provided for each of the variables. Specifically, a type, i.e. Before computing the convolution (which is really just the three
the kind of storage used by a variable must be explicitly defined. lines involving two nested for loops with indices i and j), a total
Typed languages generally offer benefits in terms of both correct- of ten lines of mere housekeeping are required. Vectors a and b
ness (as it is harder to accidentally assign to an ill-matched type) are coerced to double, and a results vector ab is allocated. This
and performance (as the compiler can optimize code based on the expression involves three calls to the PROTECT macro for which
storage and cpu characteristics). Here we have an int argument, a precisely matching UNPROTECT(3) is required as part of the in-
but return a logical, or bool for short. Two more smaller differ- terfacing of internal memory allocation. The vectors are accessed
ences are that each statement within the body must be concluded through pointer equivalents xa, xb and xab; and the latter has to
with a semicolon, and that return does not require parentheses be explicitly zeroed prior to the convolution calculation involving
around its argument. A graphical breakdown of all aspects of a incremental summary at index i + j.
corresponding C++ function is given in Figure 1.
When using Rcpp, such C++ functions can be directly em-
Extending R via the C++ API of Rcpp
bedded and compiled in an R script file through the use of the
cppFunction() provided by Rcpp Attributes (Allaire et al., 2022). Using the idioms of Rcpp, the above example can be written in a
The first parameter of the function accepts string input that rep- much more compact fashion—leading to code that is simpler to
resents the C++ code. Upon calling the cppFunction(), and read and maintain.
2 | https://cran.r-project.org/package=Rcpp Eddelbuettel and Balamuta
Fig. 1. Graphical annotation of the is_odd_cpp function.
#include "Rcpp.h" is already initialized at zero as well, reducing the entire function
using namespace Rcpp; to just the three lines for the two nested loops, plus some vari-
able declarations and the return statement. The resulting code is
// [[Rcpp::export]] shorter, easier to read, comprehend and maintain. Furthermore,
NumericVector the Rcpp code is more similar to traditional R code, which reduces
convolve_cpp(const NumericVector& a, the barrier of entry.
const NumericVector& b) {
Data Driven Performance Decisions with Rcpp
// Declare loop counters, and vector sizes
int i, j, When beginning to implement an idea, more so an algorithm, there
na = a.size(), nb = b.size(), are many ways one is able to correctly implement it. Prior to the
nab = na + nb - 1; routine being used in production, two questions must be asked:
1. Does the implementation produce the correct results?
// Create vector filled with 0
2. What implementation of the routine is the best?
NumericVector ab(nab);
The first question is subject to a binary pass-fail unit test verifi-
// Crux of the algorithm cation while the latter question is where the details of an imple-
for(i = 0; i < na; i++) { mentation are scrutinized to extract maximal efficiency from the
for(j = 0; j < nb; j++) { routine. The quality of the best routine follows first and foremost
ab[i + j] += a[i] * b[j]; from its correctness. To that end, R offers many different unit
} testing frameworks such as RUnit by Burger et al. (2018), which is
} used to construct Rcpp’s 1385+ unit tests, and testthat by Wickham
(2011). Only when correctness is achieved is it wise to begin the
// Return result procedure of optimizing the efficiency of the routine and, in turn,
return ab; selecting the best routine.
} Optimization of an algorithm involves performing a quantitative
analysis of the routine’s properties. There are two main approaches
To deploy such code from within an R script or ses- to analyzing the behavior of a routine: theoretical analysis1 or an
sion, first save it into a new file—which could be called con- empirical examination using profiling tools.2 Typically, the latter
volve.cpp—in either the working directory, a temporary direc- option is more prominently used as the routine’s theoretical prop-
toy or a project directory. Then from within the R session, use erties are derived prior to an implementation being started. Often
Rcpp::sourceCpp("convolve.cpp") (possibly using a path as the main concern regarding an implementation in R relates to the
well as the filename). This not only compiles, links and loads speed of the algorithm as it impacts how quickly analyses can be
the code within the external file but also adds the necessary done and reports can be provided to decision makers. Coinciden-
“glue” to make the Rcpp function available in the R environ- tally, the speed of code is one of the key governing use cases of
ment. Once the code is compiled and linked, call the newly-created Rcpp. Profiling R code will reveal shortcomings related to loops,
convolve_cpp() function with the appropriate parameters as e.g. for, while, and repeat; conditional statements, e.g. if-else
done in previous examples. 1
Theoretical analysis is often directed to describing the limiting behavior of a function through asymptotic
What is notable about the Rcpp version is that it has no PROTECT notation, commonly referred to as Big O and denoted as O (·).
2
or UNPROTECT which not only frees the programmer from a tedious Within base R, profiling can be activated by utils::Rprof() for individual command timing information,
utils::Rprofmem() for memory information, and System.time({}) for a quick overall execution
(and error-prone) step but more importantly also shows that mem- timing. Additional profiling R packages such as profvis by Chang et al. (2020), Rperform by Tandon and
ory management can be handled automatically. The result vector Hocking (2015), and benchmarking packages have extended the ability to analyze performance.
Eddelbuettel and Balamuta Rcpp Vignette | January 11, 2022 | 3
if-else and switch; and recursive functions, i.e. a function writ- set.seed(123)
ten in terms of itself such that the problem is broken down on each evalCpp("R::rnorm(0, 1)")
call in a reduced state until an answer can be obtained. In contrast, # [1] -0.56048
the overhead for such operations is significantly less in C++. Thus,
critical components of a given routine should be written in Rcpp One important aspect of the behind-the-scenes code generation
to capture maximal efficiency. for the single expression (as well as all code created via Rcpp
Returning to the second question, to decide which implementa- Attributes) is the automatic preservation of the state of the random
tion works the best, one needs to employ a benchmark to obtain nunber generators in R. This means that from a given seed, we
quantifiable results. Benchmarks are an ideal way to quantify how will receive identical draws of random numbers whether we access
well a method performs because they have the ability to show the them from R or via C++ code accessing the same generators (via
amount of time the code has been running and where bottlenecks the Rcpp interfaces). To illustrate, the same number is drawn via
exist within functions. This does not imply that benchmarks are R code after resetting the seed:
completely infallible as user error can influence the end results.
For example, if a user decides to benchmark code in one R session set.seed(123)
and in another session performs a heavy computation, then the # Implicit mean of 0, sd of 1
benchmark will be biased (if “wall clock” is measured). rnorm(1)
There are different levels of magnification that a benchmark # [1] -0.56048
can provide. For a more macro analysis, one should benchmark
data using benchmark(test = func(), test2 = func2()), a We can make the Rcpp Sugar function rnorm() accessible from
function from the rbenchmark R package by Kusnierczyk (2012). R in the same way to return a vector of values:
This form of benchmarking will be used when the computation is
set.seed(123)
more intensive. The motivating example isOdd() (which is only
evalCpp("Rcpp::rnorm(3)")
able to accept a single integer) warrants a much more microscopic
# [1] -0.56048 -0.23018 1.55871
timing comparison. In cases such as this, the objective is to obtain
precise results in the amount of nanoseconds elapsed. Using the
microbenchmark function from the microbenchmark R package Note that we use the Rcpp:: namespace explicitly here to con-
by Mersmann (2021) is more helpful to obtain timing information. trast the vectorised Rcpp::rnorm() with the scalar R::rnorm()
To perform the benchmark: also provided as a convenience wrapper for the C API of R.
And as expected, this too replicates from R as the very same
library("microbenchmark") generators are used in both cases along with consistent handling
results <- microbenchmark(isOddR = isOddR(12L), of generator state permitting to alternate:
isOddCpp = isOddCpp(12L))
set.seed(123)
print(summary(results)[, c(1:7)],digits=1)
rnorm(3)
# [1] -0.56048 -0.23018 1.55871
By looking at the summary of 100 evaluations, we note that
the Rcpp function performed better than the equivalent in R by
achieving a lower run time on average. The lower run time in this Translating Code from R into Rcpp: Bootstrap Example
part is not necessarily critical as the difference is nanoseconds on a
Statistical inference relied primarily upon asymptotic theory until
trivial computation. However, each section of code does contribute
Efron (1979) proposed the bootstrap. Bootstrapping is known to
to a faster overall runtime.
be computationally intensive due to the need to use loops. Thus, it
is an ideal candidate to use as an example. Before starting to write
Random Numbers within Rcpp: An Example of Rcpp Sugar C++ code using Rcpp , prototype the code in R.
Rcpp connects R with C++. Only the former is vectorized: C++ is # Function declaration
not. Rcpp Sugar, however, provides a convenient way to work with bootstrap_r <- function(ds, B = 1000) {
high-performing C++ functions in a similar way to how R offers
vectorized operations. The Rcpp Sugar vignette (Eddelbuettel and # Preallocate storage for statistics
François, 2022b) details these, as well as many more functions boot_stat <- matrix(NA, nrow = B, ncol = 2)
directly accessible to Rcpp in a way that should feel familiar to
R users. Some examples of Rcpp Sugar functions include special # Number of observations
math functions like gamma and beta, statistical distributions and n <- length(ds)
random number generation.
We will illustrate a case of random number generation. Consider # Perform bootstrap
drawing one or more N (0, 1)-distributed random variables. The for(i in seq_len(B)) {
very simplest case can just use evalCpp(): # Sample initial data
evalCpp("R::rnorm(0, 1)") gen_data <- ds[ sample(n, n, replace=TRUE) ]
# [1] -0.023751 # Calculate sample data mean and SD
boot_stat[i,] <- c(mean(gen_data),
sd(gen_data))
By setting a seed, we can make this reproducible:
}
4 | https://cran.r-project.org/package=Rcpp Eddelbuettel and Balamuta
# Return bootstrap result zero-based indexing system, meaning that they begin at 0 instead
return(boot_stat) of 1 and go up to n − 1 instead of n, which is unlike R’s system.
} Thus, an out of bounds error would be triggered if n was used as
that point does not exist within the data structure. The application
Before continuing, check that the initial prototype R code works. of this logic can be seen in the span the for loop takes in C++
To do so, write a short R script. Note the use of set.seed() to when compared to R. Another syntactical change is the use of ()
ensure reproducible draws. in place of [] while accessing the matrix. This change is due to the
# Set seed to generate data governance of C++ and its comma operator making it impossible
set.seed(512) to place multiple indices inside the square brackets.
# Generate data To validate that the translation was successful, first run the
initdata <- rnorm(1000, mean = 21, sd = 10) C++ function under the same data and seed as was given for the
# Set a new _different_ seed for bootstrapping R function.
set.seed(883) # Use the same seed use in R and C++
# Perform bootstrap set.seed(883)
result_r <- bootstrap_r(initdata) # Perform bootstrap with C++ function
result_cpp <- bootstrap_cpp(initdata)
Figure 2 shows that the bootstrap procedure worked well!
With reassurances that the method to be implemented within
Next, check the output between the functions using R ’s
Rcpp works appropriately in R, proceed to translating the code
all.equal() function that allows for an ǫ-neighborhood around
into Rcpp. As indicated previously, there are many convergences
a number.
between Rcpp syntax and base R via Rcpp Sugar.
# Compare output
#include <Rcpp.h> all.equal(result_r, result_cpp)
# [1] "Mean relative difference: 0.019931"
// Function declaration with export tag
// [[Rcpp::export]]
Lastly, make sure to benchmark the newly translated Rcpp func-
Rcpp::NumericMatrix
tion against the R implementation. As stated earlier, data is
bootstrap_cpp(Rcpp::NumericVector ds,
paramount to making a decision related to which function to use
int B = 1000) {
in an analysis or package.
// Preallocate storage for statistics library(rbenchmark)
Rcpp::NumericMatrix boot_stat(B, 2);
benchmark(r = bootstrap_r(initdata),
// Number of observations cpp = bootstrap_cpp(initdata))[, 1:4]
int n = ds.size(); # test replications elapsed relative
# 2 cpp 100 1.469 1.000
// Perform bootstrap # 1 r 100 4.552 3.099
for(int i = 0; i < B; i++) {
// Sample initial data
Rcpp::NumericVector gen_data = Using Rcpp as an Interface to External Libraries: Exploring
ds[ floor(Rcpp::runif(n, 0, n)) ];
Linear Algebra Extensions
// Calculate sample mean and std dev
boot_stat(i, 0) = mean(gen_data); Many of the previously illustrated Rcpp examples were directed
boot_stat(i, 1) = sd(gen_data); primarily to show the gains in computational efficiency that are
} possible by implementing code directly in C++; however, this is
only one potential application of Rcpp. Perhaps one of the most
// Return bootstrap results understated features of Rcpp is its ability to enable Chambers
return boot_stat; (2016)’s third statement of Interfaces to other software are part of
} R. In particular, Rcpp is designed to facilitate interfacing libraries
written in C++ or C to R. Hence, if there is a specific feature
In the Rcpp version of the bootstrap function, there are a few within a C++ or C library, then one can create a bridge to it using
additional changes that occurred during the translation. In par- Rcpp to enable it from within R.
ticular, the use of Rcpp::runif(n, 0, n) enclosed by floor(), An example is the use of C++ matrix algebra libraries like Ar-
which rounds down to the nearest integer, in place of sample(n, madillo (Sanderson, 2010) or Eigen (Guennebaud et al., 2012). By
n, replace = TRUE) to sample row ids. This is an equivalent outsourcing complex linear algebra operations to matrix libraries,
substitution since equal weight is being placed upon all row ids the need to directly call functions within Linear Algebra PACK-
and replacement is allowed.3 Note that the upper bound of the age (LAPACK) (Anderson et al., 1999) is negated. Moreover, the
interval, n, will never be reached. While this may seem flawed, Rcpp design allows for seamless transfer between object types by
it is important to note that vectors and matrices in C++ use a using automatic converters governed by wrap(), C++ to R , and
3
as<T>(), R to C++ with the T indicating the type of object being
For more flexibility in sampling see Christian Gunning’s Sample extension for RcppArmadillo and
Rcpp Gallery: Using the RcppArmadillo-based Implementation of R’s sample() or consider using the
cast into. These two helper functions provide a non-invasive way
Rcpp::sample() sugar function added in 0.12.9 by Nathan Russell. to work with an external object. Thus, a further benefit to using
Eddelbuettel and Balamuta Rcpp Vignette | January 11, 2022 | 5
Mean Bootstrap SD Bootstrap
1.2
1.5
0.8
Density
Density
1.0
0.4
0.5
0.0
0.0
20.0 20.5 21.0 21.5 9.5 10.0 10.5
Samples Samples
Fig. 2. Results of the bootstrapping procedure for sample mean and variance.
external C++ libraries is the ability to have a portable code base // Use the RcppArmadillo package
that can be implemented within a standalone C++ program or // Requires different header file from Rcpp.h
within another computational language. #include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
Compute RNG draws from a multivariate Normal. A common appli-
cation in statistical computing is simulating from a multivariate nor- With this in mind, sampling from a multivariate normal distri-
mal distribution. The algorithm relies on a linear transformation of bution can be obtained in a straightforward manner. Using only
the standard Normal distribution. Letting Y m×1 = Am×n Z n×1 + b m×1 , Armadillo data types and values:
where A is a m × n matrix, b ∈ Rm , Z ∼ N (0n , I n ), and I n is the #include <RcppArmadillo.h>
identity matrix, then Y ∼ Nm µ = b, Σ = AA T . To obtain the ma- // [[Rcpp::depends(RcppArmadillo)]]
trix A from Σ, either a Cholesky or Eigen decomposition is required.
As noted in Venables and Ripley (2002), the Eigen decomposition // Sample N x P observations from a Standard
is more stable in addition to being more computationally demand- // Multivariate Normal given N observations, a
ing compared to the Cholesky decomposition. For simplicity and // vector of P means, and a P x P cov matrix
speed, we have opted to implement the sampling procedure using // [[Rcpp::export]]
a Cholesky decomposition. Regardless, there is a need to involve arma::mat rmvnorm(int n,
one of the above matrix libraries to make the sampling viable in const arma::vec& mu,
C++. const arma::mat& Sigma) {
Here, we demonstrate how to take advantage of the Ar- unsigned int p = Sigma.n_cols;
madillo linear algebra template classes (Sanderson and Curtin,
2016) via the RcppArmadillo package (Eddelbuettel and Sander- // First draw N x P values from a N(0,1)
son, 2014; Eddelbuettel et al., 2021). Prior to running this Rcpp::NumericVector draw = Rcpp::rnorm(n*p);
example, the RcppArmadillo package must be installed using
install.packages('RcppArmadillo').4 One important caveat // Instantiate an Armadillo matrix with the
when using additional packages within the Rcpp ecosystem is // drawn values using advanced constructor
the correct header file may not be Rcpp.h. In a majority of // to reuse allocated memory
cases, the additional package ships a dedicated header (as e.g. arma::mat Z = arma::mat(draw.begin(), n, p,
RcppArmadillo.h here) which not only declares data structures false, true);
from both systems, but may also add complementary integration
and conversion routines. It typically needs to be listed in an // Simpler, less performant alternative
include statement along with a depends() attribute to tell R // arma::mat Z = Rcpp::as<arma::mat>(draw);
where to find the additional header files:
// Generate a sample from the Transformed
4
macOS users may encounter ‘-lgfortran‘ and ‘-lquadmath‘ errors on compilations with this package if the // Multivariate Normal
development environment is not appropriately set up. Section 2.16 of the Rcpp FAQ provides details
regarding the necessary ‘gfortran‘ binaries.
6 | https://cran.r-project.org/package=Rcpp Eddelbuettel and Balamuta
arma::mat Y = arma::repmat(mu, 1, n).t() +
Z * arma::chol(Sigma);
return Y;
}
As a result of using a random number generation (RNG), there
is an additional requirement to ensure reproducible results: the
necessity to explicitly set a seed (as shown above). Because of
the (programmatic) interface provided by R to its own RNGs, this
setting of the seed has to occur at the R level via the set.seed()
function as no (public) interface is provided by the R header files.
Faster linear model fits. As a second example, consider the problem
of estimating a common linear model repeatedly. One use case
might be the simulation of size and power of standard tests. Many
users of R would default to using lm(), however, the overhead
associated with this function greatly impacts speed with which an
estimate can be obtained. Another approach would be to take the
base R function lm.fit(), which is called by lm(), to compute Fig. 3. Graphical annotation of the is_odd_cpp function.
estimated β̂ in just about the fastest time possible. However, this
approach is also not viable as it does not report the estimated
standard errors. As a result, we cannot use any default R functions The interface is very simple: a matrix X n×p of regressors, and
in the context of simulating finite sample population effects on a dependent variable yn×1 as a vector. We invoke the standard
inference. Armadillo function solve() to fit the model y ~ X.5 We then
One alternative is provided by the fastLm() function in Rcp- compute residuals, and extract the (appropriately scaled) diagonal
pArmadillo (Eddelbuettel et al., 2021). of the covariance matrix, also taking its square root, in order to
return both estimates β̂ and σ̂.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
Rcpp in Packages
// Compute coefficients and their standard error Once a project containing compiled code has matured to the point of
// during multiple linear regression given a sharing it with collaborators6 or using it within a parallel computing
// design matrix X containing N observations with environments, the ideal way forward is to embed the code within
// P regressors and a vector y containing of an R package. Not only does an R package provide a way to
// N responses automatically compile source code, but also enables the use of the
// [[Rcpp::export]] R help system to document how the written functions should be
Rcpp::List fastLm(const arma::mat& X, used. As a further benefit, the package format enables the use of
const arma::colvec& y) { unit tests to ensure that the functions are producing the correct
// Dimension information output. Lastly, having a package provides the option of uploading
int n = X.n_rows, p = X.n_cols; to a repository such as CRAN for wider dissemination.
// Fit model y ~ X To facilitate package building, Rcpp provides a function
arma::colvec coef = arma::solve(X, y); Rcpp.package.skeleton() that is modeled after the base R func-
// Compute the residuals tion package.skeleton(). This function automates the creation
arma::colvec res = y - X*coef; of a skeleton package appropriate for distributing Rcpp:
// Estimated variance of the random error
double s2 = library("Rcpp")
std::inner_product(res.begin(), res.end(), Rcpp.package.skeleton("samplePkg")
res.begin(), 0.0)
/ (n - p); This shows how distinct directories man, R, src are created for,
respectively, the help pages, files with R code and files with C++
// Standard error matrix of coefficients code. Generally speaking, all compiled code, be it from C, C++ or
arma::colvec std_err = arma::sqrt(s2 * Fortran sources, should be placed within the src/ directory.
arma::diagvec(arma::pinv(X.t()*X))); Alternatively, one can achieve similar results to using
Rcpp.package.skeleton() by using a feature of the RStudio IDE.
// Create named list with the above quantities
return Rcpp::List::create( 5
We should note that this will use the standard LAPACK functionality via Armadillo whereas R uses an
Rcpp::Named("coefficients") = coef, internal refinement of LINPACK (Dongarra et al., 1979) via pivoting, rendering the operation numerically
more stable. That is an important robustness aspect—though common datasets on current hardware almost
Rcpp::Named("stderr") = std_err, never lead to actual differences. That said, if in doubt, stick with the R implementation. What is shown here
Rcpp::Named("df.residual") = n - p ); is mostly for exposition of the principles.
6
} It is sometimes said that every project has two collaborators: self, and future self. Packaging code is best
practices even for code not intended for public uploading.
Eddelbuettel and Balamuta Rcpp Vignette | January 11, 2022 | 7
Specifically, while creating a new package project there is an op- Eddelbuettel D, Balamuta JJ (2017). “Extending R with C++: A Brief Introduction
tion to select the type of package by engaging a dropdown menu to Rcpp.” PeerJ Preprints, 5. doi:10.7287/peerj.preprints.3188v1/.
to select “Package w/ Rcpp” in RStudio versions prior to v1.1.0. URL https://doi.org/10.7287/peerj.preprints.3188v1/.
In RStudio versions later than v1.1.0, support for package tem- Eddelbuettel D, Balamuta JJ (2018). “Extending R with C++: A Brief
plates has been added allowing users to directly create Rcpp-based Introduction to Rcpp.” The American Statistician, 72(1). doi:
10.1080/00031305.2017.1375990. URL https://doi.org/10.1080/
packages that use Eigen or Armadillo.
00031305.2017.1375990.
Lastly, one more option exists for users who are familiar Eddelbuettel D, François R (2011). “Rcpp: Seamless R and C++
with the devtools R package. To create the R package skele- Integration.” Journal of Statistical Software, 40(8), 1–18. doi:
ton use devtools::create("samplePkg"). From here, part 10.18637/jss.v040.i08. URL https://doi.org/10.18637/jss.v040.i08.
of the structure required by Rcpp can be added by using Eddelbuettel D, François R (2022a). Frequently Asked Questions About Rcpp.
devtools::use_rcpp(). The remaining aspects needed by Rcpp Vignette included in R package Rcpp, URL https://CRAN.R-Project.org/
must be manually copied from the roxygen tags written to con- package=Rcpp.
sole and pasted into one of the package’s R files to successfully Eddelbuettel D, François R (2022b). Rcpp syntactic sugar. Vignette included in
incorporate the dynamic library and link to Rcpp’s headers. R package Rcpp, URL https://CRAN.R-Project.org/package=Rcpp.
Eddelbuettel D, François R (2022c). Writing a package that uses Rcpp. Vignette
All of these methods take care of a number of small settings
included in R package Rcpp, URL https://CRAN.R-Project.org/package=
one would have to enable manually otherwise. These include an
Rcpp.
‘Imports:’ and ‘LinkingTo:’ declaration in file DESCRIPTION, as Eddelbuettel D, François R, Allaire J, Ushey K, Kou Q, Russel N, Chambers J,
well as ‘useDynLib’ and ‘importFrom’ in NAMESPACE. For Rcpp At- Bates D (2022). Rcpp: Seamless R and C++ Integration. R package version
tributes use, the compileAttributes() function has to be called. 1.0.8, URL https://CRAN.R-Project.org/package=Rcpp.
Similarly, to take advantage of its documentation-creation feature, Eddelbuettel D, François R, Bates D, Ni B (2021). RcppArmadillo: Rcpp in-
the roxygenize() function from roxygen2 has to be called.7 Ad- tegration for Armadillo templated linear algebra library. R package version
ditional details on using Rcpp within a package scope are detailed 0.10.7.5.0, URL https://CRAN.R-Project.org/package=RcppArmadillo.
in Eddelbuettel and François (2022c). Eddelbuettel D, Horner J (2021). littler: R at the Command-Line via r. R package
version 0.3.15, URL https://CRAN.R-Project.org/package=littler.
Eddelbuettel D, Sanderson C (2014). “RcppArmadillo: Accelerating R with
Conclusion High-Performance C++ Linear Algebra.” Computational Statistics and Data
Analysis, 71, 1054–1063. doi:10.1016/j.csda.2013.02.005. URL
R has always provided mechanisms to extend it. The bare-bones C https://dx.doi.org/10.1016/j.csda.2013.02.005.
API is already used to great effect by a large number of packages. Efron B (1979). “Bootstrap Methods: Another Look at the Jackknife.” The Annals
By taking advantage of a number of C++ features, Rcpp has been of Statistics, 7(1), 1–26. URL https://www.jstor.org/stable/2958830.
able to make extending R easier, offering a combination of both Guennebaud G, Jacob B, et al. (2012). “Eigen v3.” URL https://eigen.tuxfamily.
speed and ease of use that has been finding increasingly widespread org.
utilization by researchers and data scientists. We are thrilled about Kusnierczyk W (2012). rbenchmark: Benchmarking routine for R. R package
this adoption, and look forward to seeing more exciting extensions version 1.0.0, URL https://CRAN.R-Project.org/package=rbenchmark.
Mersmann O (2021). microbenchmark: Accurate Timing Functions. R package
to R being built.
version 1.4-9, URL https://CRAN.R-Project.org/package=microbenchmark.
R Core Team (2021a). R: A Language and Environment for Statistical Computing.
Acknowledgments. We thank Bob Rudis and Lionel Henry for ex- R Foundation for Statistical Computing, Vienna, Austria. URL https://www.
cellent comments and suggestion on an earlier draft of this manuscript. R-project.org/.
Furthermore, we appreciate the improved C++ annotated function graphic R Core Team (2021b). Writing R extensions. R Foundation for Statistical
provided by Bob Rudis. This version is a pre-print of Eddelbuettel and Computing, Vienna, Austria. URL https://CRAN.R-Project.org/doc/manuals/
Balamuta (2017, 2018).
R-exts.html.
Sanderson C (2010). “Armadillo: An open source C++ Algebra Library for Fast
References Prototyping and Computationally Intensive Experiments.” Technical report,
NICTA. URL https://arma.sf.net.
Allaire JJ, Eddelbuettel D, François R (2022). Rcpp Attributes. Vignette included Sanderson C, Curtin R (2016). “Armadillo: A Template-Based C++ Library for
in R package Rcpp, URL https://CRAN.R-Project.org/package=Rcpp. Linear Algebra.” JOSS, 1(2). doi:10.21105/joss.00026. URL https:
Anderson E, Bai Z, Bischof C, Blackford S, Demmel J, Dongarra J, Du Croz J, //dx.doi.org/10.21105/joss.00026.
Greenbaum A, Hammarling S, McKenney A, Sorensen D (1999). LAPACK Tandon A, Hocking TD (2015). Rperform: Rperform - Performance testing for R
Users’ Guide. Third edition. Society for Industrial and Applied Mathematics, packages. R package version 0.0.0.9000.
Philadelphia, PA. ISBN 0-89871-447-8 (paperback). Venables WN, Ripley BD (2002). Modern Applied Statistics with S. Fourth edition.
Burger M, Juenemann K, Koenig T (2018). RUnit: R Unit Test Framework. R Springer, New York. ISBN 0-387-95457-0, URL https://www.stats.ox.ac.uk/
package version 0.4.32, URL https://CRAN.R-Project.org/package=RUnit. pub/MASS4.
Chambers JM (2008). Software for Data Analysis: Programming with R. Statistics Wickham H (2011). “testthat: Get Started with Testing.” The R Journal, 3, 5–10.
and Computing. Springer-Verlag, Heidelberg. ISBN 978-0-387-75935-7.
Chambers JM (2016). Extending R. The R Series. Chapman and Hall/CRC,
London. ISBN 9781498775717.
Chang W, Luraschi J, , Mastny T (2020). profvis: Interactive Visualizations for
Profiling R Code. R package version 0.3.7, URL https://CRAN.R-Project.org/
package=profvis.
Dongarra JJ, Moler CB, Bunch JR, Stewart GW (1979). LINPACK users’ guide.
SIAM.
Eddelbuettel D (2013). Seamless R and C++ Integration with Rcpp. Use R!
Springer, New York. ISBN 978-1-4614-6867-7.
7
The littler package (Eddelbuettel and Horner, 2021) has a helper script ‘roxy.r‘ for this.
8 | https://cran.r-project.org/package=Rcpp Eddelbuettel and Balamuta