-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Use pybind11 for C/C++ extensions #23787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -302,6 +302,8 @@ def make_release_tree(self, base_dir, files): | |||
setup_requires=[ | |||
"certifi>=2020.06.20", | |||
"numpy>=1.19", | |||
"pybind11>=2.6", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new build dependency, but is not a runtime dependency.
"matplotlib._qhull", ["src/_qhull_wrapper.cpp"], | ||
cxx_std=11, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can specify the C++ standard to follow, so here we are explicitly restricting to C++11. This would probably be a configuration param at the top of the file to make it clear what C++ standard the project is currently working to.
PyTuple_SET_ITEM(tuple, 0, triangles.pyobj()); | ||
PyTuple_SET_ITEM(tuple, 1, neighbors.pyobj()); | ||
return tuple; | ||
return py::make_tuple(triangles, neighbors); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good example of improved handling of creating a tuple to return.
: _x(x), | ||
_y(y), | ||
_triangles(triangles), | ||
_mask(mask), | ||
_edges(edges), | ||
_neighbors(neighbors) | ||
{ | ||
if (_x.ndim() != 1 || _y.ndim() != 1 || _x.shape(0) != _y.shape(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argument type checking is handled automatically by pybind11
. Here we need to check that array shapes are consistent. These checks were originally in the wrapper functions that no longer exist.
Can install using Edit: this is no longer a problem. |
I am weakly in favor of this. I think getting away from hand-rolled c-extensions is a good idea. Another down side is that this makes the bet that pybind11 is going to last but given that we already have a transient dependency and scipy has picked up a pybind11 dependency, I think we are past avoiding that risk. |
src/_qhull_wrapper.cpp
Outdated
@@ -125,8 +135,8 @@ class QhullInfo { | |||
/* Delaunay implementation method. | |||
* If hide_qhull_errors is true then qhull error messages are discarded; | |||
* if it is false then they are written to stderr. */ | |||
static PyObject* | |||
delaunay_impl(npy_intp npoints, const double* x, const double* y, | |||
py::tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can stay static (i.e. module-private)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
src/_qhull_wrapper.cpp
Outdated
qhull_error_msg[exitcode], exitcode, | ||
hide_qhull_errors ? "; use python verbose option (-v) to see original qhull error." : ""); | ||
return NULL; | ||
std::ostringstream oss; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I have a small helper in mplcairo for string formatting:
https://github.com/matplotlib/mplcairo/blob/3d9151da64587d0e1116ccd7a425c6e899efdfa0/src/_util.cpp#L93-L95
https://github.com/matplotlib/mplcairo/blob/3d9151da64587d0e1116ccd7a425c6e899efdfa0/src/_mplcairo.cpp#L336
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's nice! I hadn't considered dropping back to Python to do the string formatting.
m.doc() = "Computing Delaunay triangulations.\n"; | ||
|
||
m.def("delaunay", &delaunay, | ||
"delaunay(x, y, /)\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use https://pybind11.readthedocs.io/en/stable/basics.html instead to specify argument names (possibly together with https://pybind11.readthedocs.io/en/stable/reference.html?highlight=kw_only#_CPPv48pos_only if you want to keep them positional-only).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, done this. It does improve the docstring.
src/_qhull_wrapper.cpp
Outdated
"triangles, neighbors : int arrays, shape (ntri, 3)\n" | ||
" Indices of triangle vertices and indices of triangle neighbors.\n"); | ||
|
||
m.def("version", &version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just make this a lambda []() { return qh_version; }
(returning a char*
is OK).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very happy to use a lambda! There are no lambdas yet in our C++ codebase, but if you think other devs are happy to have them...
&yarray.converter_contiguous, &yarray)) { | ||
return NULL; | ||
} | ||
if (x.ndim() != 1 || y.ndim() != 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually convert my arrays to use unchecked access (https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html#direct-access) (which of course means that out of bounds access can result in anything) and implicitly perform the dimensionality check at the same time as the conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've included examples of both .unchecked<>()
and .data()
to illustrate the possibilities and to minimise the code changes. I would default to using .data()
for these 2 modules as all 2D arrays used have a fixed shape(1)
of either 2 or 3, and use of data
and a hardcoded multiple of 2 or 3 is much faster than using unchecked
which needs to get the strides. For other extensions which accept arbitrarily-shaped arrays then probably unchecked
would be preferred.
The lifetimes of these arrays is strictly controlled by the calling code so I may as well check the dims are OK just once, in the constructor.
See also #9763 which did the same for ft2font. |
Apart from the question of pybind11 lasting, which I cannot really judge beyond that it seems to quite actively maintained at the moment (and historically), I think this is a good idea! @anntzer it is realistic to get #9763 up and running again if it is decided to go with pybind11? (Quite a lot of conflicts, but no idea if they are just from the import section or more complicated than that.) |
That may need a bit of work, but is certainly doable. Or it could be a small project if any newcomer (well-versed in C++ and/or freetype) wants to pick it up :) |
I added it to today's agenda. If we are going to adopt pybind11 we should write up issues to move all of our wrappers over. |
As @anntzer can't make today's meeting we should probably bump discussing pybind11 to next week? |
I am inferring @anntzer is in favor of picking up the dependency, but bumping to next week is reasonable. |
I am +0.5 on picking the dependency. I really like pybind11; OTOH in our specific case the C code is already there (on the third hand switching to pybind11 is probably better for maintainability). |
the examples @ianthomas23 gave made it seem that way, plus the easier maintainability may make that part of the code base more approachable |
91683d3
to
b15fd9c
Compare
I didn't know about this, but I am glad that it exists as I think that wrapping ft2font is a larger task than the 2 extensions in this PR. I think we should discourage use of C++17. NumPy is using C++11 |
This is building and passing tests in CI on Linux and Windows but not macOS. Problem is the mixture of C and C++ files in I can think of a few ways of attempting to solve this, all of them slightly unpleasant. |
c3028c3
to
2f8cfec
Compare
de3f39e
to
846f4c7
Compare
This now passes all CI. Some of it probably isn't yet suitable for production code, but it is a good starting point to demonstrate that the approach could work. |
@@ -62,6 +62,7 @@ install: | |||
- echo %PYTHON_VERSION% %TARGET_ARCH% | |||
# Install dependencies from PyPI. | |||
- python -m pip install --upgrade -r requirements/testing/all.txt %EXTRAREQS% %PINNEDVERS% | |||
- python -m pip install "pybind11<2.10" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the pin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall the precise reason, presumably I was experimenting when the build didn't work. But it doesn't matter now, since merging of #24102 we don't need to preinstall pybind11
for editable installs so changes like this can be reverted.
mpl._tri.Triangulation() | ||
|
||
with pytest.raises( | ||
ValueError, match=r'x and y must be 1D arrays of the same length'): | ||
mpl._tri.Triangulation([], [1], [[]], None, None, None, False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if None is not accepted anymore this warrants a changelog entry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the internal C++ Triangulation API (matplotllib._tri.Triangulation
) that is used by the Python Triangulation class (matplotlib.tri.Triangulation
). It is all private and nobody should be using it, so it doesn't need a changelog entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python Triangulation class still accepts None
for triangles
, etc.
py::arg("edges"), | ||
py::arg("neighbors"), | ||
py::arg("correct_triangle_orientations"), | ||
"Triangulation(x, y, triangles, mask, edges, neighbors)\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get rid of this manually typed signature. (Ditto everywhere.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, following the use of py::arg
.
numpy::array_view<int, ndim> triangles(dims); | ||
int* triangles_ptr = triangles.data(); | ||
int dims[2] = {ntri, 3}; | ||
IndexArray triangles(dims); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't even need to allocate dims on the stack; I usually just write IndexArray triangle{{ntri, 3}};
. Ditto below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. Remember that the idea of this PR is to demonstrate the new approach works and show how to map from the old to the new implementation. There used to be an explicit line declaring the dims
, so the new code has the equivalent line declaring the dims
.
I am going to stop working on this PR. I don't like the solution I came up with for compiling the qhull wrapper (composed of both C and C++ files) as I ended up altering distutils private methods to prevent C++ specific compiler flags being used to compile C code. Instead I will submit a PR just for the triangulation module part of this PR, and I can consider a better approach for the qhull wrapper in time. I'm happy just to close this PR and leave the branch hanging around for future comparison. Alternatively it could be kept open but labelled in some way to indicate that it won't ever be merged? |
Just close the PR. Open PRs anticipate a follow up action on the PR, even if that action is only "decide what to do". |
This is a proof of concept of using
pybind11
to wrap and compile MPL C/C++ extensions. It covers just the_qhull
and_tri
extensions as (1) they are relatively small and self-contained, and (2) I understand them well. The motivation is to provide information to other devs on what it would involve and then consider if this is a suitable approach for all of our C/C++ extensions or not.Advantages:
array_view
class innumpy_cpp.h
aspybind11
includes its own NumPy array wrapper.PyArg_ParseTuple
as we get this for free.list
andtuple
, so no longer need to deal with Python C/API reference counting and error checking.pybind11
catches it and converts it to a Python equivalent exception._qhull.so
and_tri.so
are 1.6 MB and 1.3 MB onmain
but only 1.0 MB and 0.7 MB on this branch.Disadvantages:
I have tried to keep the changes as small as possible to make it clear what the minimal set of changes is. But if we were to go ahead with this approach we would probably want to make more, particularly to
typedef
s and numpy array accessors, to standardise and make the code clearer.Performance of a large
qhull
Delaunay triangulation is essentially unchanged by this PR. Performance of a largetricontour
call is ~3% faster. There is a tradeoff here between performance and code clarity, in particular whether we access multi-dimensional numpy arrays via conventional-looking indexing (e.g.array(j, i)
) which is slower than accessing via low-level C pointers where we do the index maths ourselves for speed.I know of one core dev (@anntzer) and one occasional contributor (me) who have
pybind11
experience, viamplcairo
andcontourpy
respectively.(Edited to add another disadvantage).