Thirteen Simple Steps for Creating An R Package
with an External C++ Library
Dirk Eddelbuettel1
1
Department of Statistics, University of Illinois, Urbana-Champaign, IL, USA
This version was compiled on January 11, 2022
We desribe how we extend R with an external C++ code library by using Ensure it still works. This may seem like a variation on the previous
the Rcpp package. Our working example uses the recent machine learning point, but besides the ability to build we also need to ensure the
library and application ‘Corels’ providing optimal yet easily interpretable ability to run the software. If the external entity has tests and
rule lists (Angelino et al., 2017) which we bring to R in the form of the demo, it is highly recommended to run them. If there are reference
RcppCorels package (Eddelbuettel, 2019). We discuss each step in the results, we should ensure that they are still obtained, and also that
process, and derive a set of simple rules and recommendations which are the run-time performance it still (at a minimum) reasonable.
illustrated with the concrete example.
Ensure it is compelling. This is of course a very basic litmus test:
is the new software relevant? Is is helpful? Would others benefit
from having it packaged and maintained?
Introduction
Start an Rcpp package. The first step in getting a new package
The process of building a new package with Rcpp can range from combing R and C++ is often the creation of a new Rcpp pack-
the very easy—a single simple C++ function—to the very complex. age. There are several helper functions to choose from. A natural
If, and how, external resources are utilised makes a big difference first choice is Rcpp.package.skeleton() from the Rcpp pack-
as this too can range from the very simple—making use of a header- age (Eddelbuettel et al., 2022). It can be improved by having the
only library, or directly including a few C++ source files without optional helper package pkgKitten (Eddelbuettel, 2021) around
further dependencies—to the very complex. as its kitten() function smoothes some rougher edges left by
Yet a lot of the important action happens in the middle ground. the underlying Base R function package.skeleton(). This step is
Packages may bring their own source code, but also depend on shown below in then appendix, and corresponds to the first commit,
just one or two external libraries. This paper describes one such followed by a first edit of file DESCRIPTION.
approach in detail: how we turned the Corels application (An- Any code added by the helper functions, often just a simple
gelino et al., 2017; Laurus-Stone, 2019) (provided as a standalone helloWorld() variant, can be run to ensure that the package is
C++-based executable) into an R-callable package RcppCorels (Ed- indeed functional. More importantly, at this stage, we can also
delbuettel, 2019) via Rcpp (Eddelbuettel et al., 2022; Eddelbuettel start building the package as a compressed tar archive and run the
and François, 2011). R checker on it.
Integrate External Package. Given a basic package with C++ sup-
The Thirteen Key Steps port, we can now turn to integrating the external package. This
Ensure Use of a Suitable license. Before embarking on such a jour- complexity of this step can, as alluded to earlier, vary from very
ney, it is best to ensure that the licensing framework is suitable. easy to very complex. Simple cases include just dependending
Many different open-source licenses exists, yet a few key ones on library headers which can either be copied to the package, or
dominate and can generally be used with each other. There is how- be provided by another package such as BH (Eddelbuettel et al.,
ever a fair amount of possible legalese involved, so it is useful to 2021). It may also be a dependency on a fairly standard library
check inter-license compatibility, as well as general usability of the available on most if not all systems. The graphics formats bmp,
license in question. Several sites can help via license recommen- jpeg or png may be example; text formats like JSON or XML are
dations, and checks for interoperability. One example is the site another. One difficulty, though, may be that run-time support does
at choosealicense.com (which is backed by GitHub) can help, as not always guarantee compile-time support. In these cases, a -dev
can tldrlegal.com. License choice is a complex topic, and general or -devel package may need to be installed.
recommendations are difficult to make besides the key point of In the concrete case of Corels, we
sticking to already-established and known licenses. • copied all existing C++ source and header files over into the
src/ directory;
Ensure the Software builds. In order to see how hard it may to • renamed all header files from *.hh to *.h to comply with an
combine an external entity, either a program a library, with R, it R preference;
helps to ensure that the external entity actually still builds and • create a minimal src/Makevars file, initially with link instruc-
runs. tions for GMP later relaxed to conditional use of GMP (see
This may seem like a small and obvious steps, but experience below);
suggests that it worth asserting the ability to build with current • moved main.cc to a subdirectory as we cannot build with
tools, and possibly also with more than one compiler or build- another main() function (and R will not include files from
system. Consideration to other platforms used by R also matter a subdirectories);
great deal as one of the strengths of the R package system is its • added a minimal R-callable function along with a logger
ability to cover the three key operating system families. instance.
https://arxiv.org/abs/1911.06416 Thirteen Steps for R and C++ Library Packages | January 11, 2022 | 1–4
Here, the last step was needed as the file main.cc provided fs::dir_tree("../../../rcppcorels/inst/sample_data")
a global instance referred to from other files. Hence, a minimal # ../../../rcppcorels/inst/sample_data
R-callable wrapper is being added at this stage (shown in the # +-- compas_test-binary.csv
appendix as well). Actual functionality will be added later. # +-- compas_test.csv
We will come back to the step concerning the link instructions. # +-- compas_test.label
As this point we have a package for R also containing the library # +-- compas_test.out
we want to add. # +-- compas_train-binary.csv
# +-- compas_train.csv
Make the External Code compliant with R Policies. R has fairly # +-- compas_train.label
strict guidelines, defined both in the CRAN Repository Policy docu- # +-- compas_train.minor
ment at the CRAN website, and in the manual Writing R Extension. # \-- compas_train.out
Certain standard C and C++ functions are not permitted as their
use could interfere with running code from R. This includes some-
what obvious recommendations (“do not call abort” as it would Set up working example. Combining the two preceding steps,
terminate the R sessions) but extends to not using native print we can now offer an illustrative example. It is included in the
methods in order to cooperate better with the input and output helpd page for function corels() and can be run from R via
facilities of R. So here, and reflecting that last aspect, we changed example("corels").
all calls to printf() to calls to Rprintf(). Similarly, R prefers
its own (well-tested) random-number generators so we replaced library(RcppCorels)
one (scaled) call to random() / RAND_MAX with the equivalent
call to R’s unif_rand(). We also avoided one use of stdout in .sysfile <- function(f) # helper function
rulelib.h. system.file("sample_data",f,package="RcppCorels")
The requirement for such changes may seem excessive at first,
but the value added stemming from consistent application of the rules_file <- .sysfile("compas_train.out")
CRAN Policies is appreciated by most R users. label_file <- .sysfile("compas_train.label")
meta_file <- .sysfile("compas_train.minor")
Complete the Interface. In order to further test the package, and of logdir <- tempdir()
course also for actual use, we need to expose the key parameters
and arguments. Corels parsed command-line arguments; we can stopifnot(file.exists(rules_file),
translate this directly into suitable arguments for the main function. file.exists(labels_file),
At a first pass, we created the following interface: file.exists(meta_file),
dir.exists(logdir))
// [[Rcpp::export]]
bool corels(std::string rules_file, corels(rules_file, labels_file, logdir, meta_file,
std::string labels_file, verbosity = 100,
std::string log_dir, regularization = 0.015,
std::string meta_file = "", curiosity_policy = 2, # by lower bound
bool run_bfs = false, map_type = 1) # permutation map
bool calculate_size = false,
bool run_curiosity = false, cat("See ", logdir, " for result file.")
int curiosity_policy = 0,
bool latex_out = false, In the example, we pass the two required arguments for rules
int map_type = 0, and labels files, the optional argument for the ‘meta’ file as well
int verbosity = 0, as an added required argument for the output directory. R policy
int max_num_nodes = 100000, prohibits writing in user-directories, we default to using the tem-
double regularization = 0.01, porary directory of the current session, and report its value at the
int logging_frequency = 1000, end. For other arguments default values are used.
int ablation = 0) {
Finesse Library Dependencies. One common difficulty when bring-
// actual function body omitted ing an extermal library to R via a package consists in dealing with
} an external dependency. In the case of ‘Corels’, the GNU GMP li-
brary for multi-precision arithmetic is an optional extension which,
Rcpp facilities the integration by adding another wrapper expos- if available, improves and accelerates internal processing.
ing all the function arguments, and setting up required arguments The simplest approach is to declare a compile-time variable
without default (the first three) along with optional arguments in the src/Makevars file. Using -DGMP defines the GMP variable
given a default. The user can now call corels() from R with three at the level of the C and C++ code. One can then condition on
required arguments (the two input files plus the log directory) as the variable. A very standard approach, also used here is #if
well as number of optional arguments. defined(GMP) ... #else ... #endif where one of the two
code branches is in effect depending on whether the GMP variable
Add Sample Data. R package can access data files that are shipped is defined or not.
with them. That is very useful feature, and we therefore also copy In order to detect presence of a required (or optional) library,
in the files include in the Corels repository and its data/ directory. tools like ‘autoconf’ or ‘cmake’ are often used. For example, one
2 | https://arxiv.org/abs/1911.06416 Eddelbuettel
can rely of an existing ‘autoconf’ macro provided by the GMP
documentation to detect presence of the the GNU GMP header and
library. We are making use of this facility here to deploy GMP when
it is available. As ‘Corels’ can be built with and without GMP, the
build and installation succeeds either way—but deployment of the
more-featureful variant with use GMP is automated.
Finalise License and Copyright. It is good (and common) prac-
tice to clearly attribute authorship. Here, credit is given to the
‘Corels’ team and authors as well as to the authors of the underly-
ing ‘rulelib’ code used by ‘Corels’ via the file inst/AUTHORS (which
will be installed as AUTHORS with the package. In addition, the
file inst/LICENSE clarifies the GNU GPL-3 license for ‘RcppCorels’
and ‘Corels’, and the MIT license for ‘rulelib’.
Additional Bonus: Some more ‘meta’ files. Several files help to
improve the package. For example, .Rbuildignore allows to
exclude listed files from the resulting R package keeping it well-
defined. Similarly, .gitignore can exclude files from being added
to the git repository. We also like .editorconfig for consistent
editing default across a range of modern editors.
Summary
We describe s series of steps to turn the standalone library ‘Corels’
describes by Angelino et al. (2017) into a R package RcppCorels
using the facilities offered by Rcpp (Eddelbuettel et al., 2022).
Along the way, we illustrate key aspects of the R package standards
and CRAN Repository Policy proving a template for other research
software wishing to provide their implementations in a form that
is accessibly by R users.
References
Angelino E, Larus-Stone N, Alabi D, Seltzer M, Rudin C (2017). “Learning
Certifiably Optimal Rule Lists for Categorical Data.” arXiv:1704.01701.
Eddelbuettel D (2019). “RcppCorels: R binding for the ’Certifiably Optimal RulE
ListS (Corels)’ Learner.” https://github.com/eddelbuettel/rcppcorels.
Eddelbuettel D (2021). pkgKitten: Create Simple Packages Which Do not Upset
R Package Checks. R package version 0.2.2, URL https://CRAN.R-Project.
org/package=pkgKitten.
Eddelbuettel D, Emerson JW, Kane MJ (2021). BH: Boost C++ Header Files. R
package version 1.78.0-0, URL https://CRAN.R-Project.org/package=BH.
Eddelbuettel D, François R (2011). “Rcpp: Seamless R and C++
Integration.” Journal of Statistical Software, 40(8), 1–18. doi:
10.18637/jss.v040.i08. URL https://doi.org/10.18637/jss.v040.i08.
Eddelbuettel D, François R, Allaire J, Ushey K, Kou Q, Russel N, Chambers J,
Bates D (2022). Rcpp: Seamless R and C++ Integration. R package version
1.0.8, URL https://CRAN.R-Project.org/package=Rcpp.
Laurus-Stone N (2019). “corels: Learning Certifiably Optimal Rule Lists.” https:
//github.com/nlarusstone/corels. Also online at https://corels.eecs.harvard.
edu/corels/.
Eddelbuettel Thirteen Steps for R and C++ Library Packages | January 11, 2022 | 3
Appendix 1: Creating the basic package.
edd@rob:~/git$ r --packages Rcpp --eval 'Rcpp.package.skeleton("RcppCorels")'
Attaching package: ‘utils’
The following objects are masked from ‘package:Rcpp’:
.DollarNames, prompt
Creating directories ...
Creating DESCRIPTION ...
Creating NAMESPACE ...
Creating Read-and-delete-me ...
Saving functions and data ...
Making help files ...
Done.
Further steps are described in './RcppCorels/Read-and-delete-me'.
Adding Rcpp settings
>> added Imports: Rcpp
>> added LinkingTo: Rcpp
>> added useDynLib directive to NAMESPACE
>> added importFrom(Rcpp, evalCpp) directive to NAMESPACE
>> added example src file using Rcpp attributes
>> added Rd file for rcpp_hello_world
>> compiled Rcpp attributes
edd@rob:~/git$
edd@rob:~/git$ mv RcppCorels/ rcppcorels # prefer lowercase directories
edd@rob:~/git$
Appendix 2: A Minimal src/Makevars. In the file shown here, use of GMP is unconditional: we define GMP as a compiler flag, and instruct
the linker to link with the GMP library.
CXX_STD = CXX11
PKG_CXXFLAGS = -I. -DGMP
PKG_LIBS = $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) -lgmp
Appendix 3: A Placeholder Wrapper.
#include "queue.h"
#include <Rcpp.h>
/*
* Logs statistics about the execution of the algorithm and dumps it to a file.
* To turn off, pass verbosity <= 1
*/
NullLogger* logger;
// [[Rcpp::export]]
bool corels() {
return true; // more to fill in, naturally
}
4 | https://arxiv.org/abs/1911.06416 Eddelbuettel