-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
API: Expose the dtype C API #25754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Expose the dtype C API #25754
Conversation
Are all these APIs required to be made public in order to enable user-defined DTypes? |
Could you divide this into two PRs:
|
Unfortunately you need to access the dtypemeta struct to initialize a custom dtype. See e.g. here. You also need to use the APIs that take @seberg suggested on Slack to add a typedef for Users of the custom dtypes from python won't care. I think that would let me get rid of all the ifdefs in Does that sound reasonable? I guess we could also provide macros that allow setting and getting the dtypemeta attributes and make |
Nice, it looks like that fix works, see last commit. |
dfb8332
to
20b3a81
Compare
This should be ready to review as far as the code changes go. I still need to look at the documentation. I'm going to try to get the doxygen integration working (although that might be a pain because the codegen forces me to break the doxygen formatting slightly). If that doesn't work, I'll just add the docstrings to the sphinx API docs manually. |
One question I have: how should I test that my changes to |
Excellent, tests are green :) |
e680875
to
2b36426
Compare
#define PyArray_CDoubleDType (*(PyArray_DTypeMeta *)__experimental_dtype_api_table[36]) | ||
#define PyArray_CLongDoubleDType (*(PyArray_DTypeMeta *)__experimental_dtype_api_table[37]) | ||
/* String/Bytes */ | ||
#define PyArray_StringDType (*(PyArray_DTypeMeta *)__experimental_dtype_api_table[38]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never noticed until now but I think it's a bug that we exposed this name in the public DType API. I changed it to PyArray_BytesDType
instead to avoid the python2 baggage, which is its internal name. This is being newly exposed so let's give it a name that makes sense in Python3.
typedef PyTypeObject PyArray_DTypeMeta; | ||
|
||
#endif /* Py_LIMITED_API */ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg there are some comments below about "expected changes" for the ArrayMethod API. Do you still want to eventually do those changes? Now is probably the time...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't notice anything that should change, probably not a big worry. All of this is designed so that we can deprecate/replace it after all.
@@ -141,6 +149,7 @@ typedef struct { | |||
#define NPY_METH_unaligned_contiguous_loop 8 | |||
#define NPY_METH_contiguous_indexed_loop 9 | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_NPY_METH_get_loop
is still marked as private. Maybe we should just make it public given it's pretty useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yah, got to make it public, but we could follow up also to keep the diff smaller.
(There are a few things I dislike about it, but we aren't gonna change them quickly, and that is OK)
@@ -187,6 +196,7 @@ typedef NPY_CASTING (resolve_descriptors_with_scalars_function)( | |||
npy_intp *view_offset); | |||
|
|||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we give the other typedefs in this file names like PyArrayMethod_ResolveDescriptorsFunction
?
The main thing left now is the typedefs, because I think we probably want to give them names in the |
api_table[37] = &PyArray_PyComplexAbstractDType; | ||
api_table[38] = &PyArray_DefaultIntDType; | ||
/* Non-legacy DTypes that are built in to NumPy */ | ||
api_table[39] = &PyArray_StringDType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think I had suggested that before, but just mentionig it and hoping @mattip can voice an opinion:
An alternative that I rather like would be to add a big chunk of "empty slots" with a common to the api-gen (i.e. just with a + list(range(...))
) and then fill the normal API table here and above with that offset.
There isn't a huge reason for it, but I somehwat like the idea of having fewer structs capsules (in fact, I would support duplicating the ufunc API in the struct so that at some point we can consolidate it all into one).
I may have some tweaks if I review the API, but I itself, but I would prefer to just push this through, and the docs look nice on a quick glance.
(Tweaking this API is completely fine also after RC, the 0 people using it in their wheels already then, can recompile :))
Don't worry about it too much, downstream runs these tests on their CI. |
OK awesome. The last thing that needs to be documented is the typedefs. We have |
That sounds like a good idea. |
I think this is fully ready now. |
Also as a followup we're going to need to fix all the dtypes in numpy-user-dtypes after this is merged. We should probably also add a noisy cron CI job there to let us know if we break public API. |
Awesome, I'll have a look tomorrow, would like to get this in soon even soon (even if we notice some follow ups, which also goes for any naming bike-shedding: happy to apply any before the final release). (I think I may try to see how ugly it is to just fill up a gap in the dtype table, rather than exporting a second capsule, since I don't love the idea of exporting two capsules.) |
The doc build appears to have passed, but just a note that on my local mac setup I'm now hitting sphinx-doc/sphinx#11449 when I do |
I ended up figuring out this was because of seg fault coming from IPython of all places, see sphinx-doc/sphinx#11449 (comment). Anyway, probably nothing to do with this PR. |
4d46968
to
5c96ba8
Compare
5c96ba8
to
eb01d23
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have pushed a small doc fixup and the second commit which removes the additional capsule and adds the new API directly (by just reserving slots 320-360 for these, the last of which is empty, but I thought it doesn't matter).
I also moved the table lookup defines into _public_dtype_api_table.h
just to try to not bloat that generate_api.py
script.
I think I prefer that quite a bit, not because it is much prettier code-wise, but because I don't like juggling multiple capsules.
I can see renaming some of the function definitions to include Func
at the end (not the ones with Loop
), and we should see if anyone has any other suggestions.
But for me: I think we can merge it now, ping the mailing list and create issues for anything else. As this is completely new API, I do not see a problem with changing some things during RC phase, also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a look at the new documentation and mostly found some typos, but also some requests for clarification.
Overall, very nice!
I guess one todo item will be to ensure the same user dtypes are updated to the new names...
doc/source/reference/c-api/array.rst
Outdated
|
||
The unaligned loops are currently only used in casts and will never be picked | ||
in ufuncs (ufuncs create a temporary copy to ensure aligned inputs). | ||
These flags are ignored when ``NPY_METH_get_loop`` is defined, where instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These flags -> These functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it to "slot IDs", which I think is a little more accurate than "functions".
doc/source/reference/c-api/array.rst
Outdated
.. c:enumerator:: NPY_METH_REQUIRES_PYAPI | ||
|
||
Indicates the method must hold the GIL. If this flag is not set, the GIL | ||
is released before the ufunc is called. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ufunc -> loop? (or does this not hold for casts?)
doc/source/reference/c-api/array.rst
Outdated
.. c:enumerator:: NPY_METH_NO_FLOATINGPOINT_ERRORS | ||
|
||
Indicates the method cannot generate floating errors, so checking for | ||
floating errors after the ufunc completes can be skipped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ufunc -> loop
doc/source/reference/c-api/array.rst
Outdated
created for and the concrete descriptor instances used at runtime, determines | ||
the correct descriptors to pass into the loop implementation, which may or | ||
may not be the descriptors passed in by the user. The *method* is an opaque | ||
pointer to the underlying cast or ufunc loop which is currently exposed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does "which is currently exposed publicly" mean here? I would have expected a (link to a) description of how to use this inside one's own resolution loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo, should have been "currently not exposed publicly".
doc/source/reference/c-api/array.rst
Outdated
^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
In addition the above slots, the following slots are exposed to allow | ||
filling the ``PyArray_ArrFuncs`` struct attached to descriptor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have a working link to PyArray_ArrFuncs
here. Also, are these still necessary, or just there for backwards compatibility? (e.g., dotfunc
seems pretty pointless, but compare
likely useful)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just backcompat: we should replace them all, but we don't have that yet (e.g. for sorts/compare).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note that this will likely change in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, missed that - so ignore my request just now to add such a note!
doc/source/reference/c-api/array.rst
Outdated
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
These macros and static inline functions are provided to allow more | ||
understandable and iomatic code when working with ``PyArray_DTypeMeta`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iomatic -> idiomatic
@@ -48,14 +48,17 @@ supportive role: the :c:data:`PyArrayIter_Type`, the | |||
. The :c:data:`PyArrayIter_Type` is the type for a flat iterator for an | |||
ndarray (the object that is returned when getting the flat | |||
attribute). The :c:data:`PyArrayMultiIter_Type` is the type of the | |||
object returned when calling ``broadcast`` (). It handles iteration | |||
and broadcasting over a collection of nested sequences. Also, the | |||
object returned when calling ``broadcast`` (). It handles iteration and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the ()
? (not new, I know)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no idea, deleted it
Corresponds to the type number for legacy data types. Data | ||
types defined outside of NumPy and possibly future data types | ||
shipped with NumPy will have ``type_num`` set to -1, so this | ||
should be relied on to descriminate between data types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should not be relied on?
The last commit responds to all of Marten's comments. |
Should we get this in once things pass, just so I can rebase the other ones? (Happy if there are follow up changes.) Assuming you were happy with my API table reorganization. |
Yeah no worries about the reorganization, the only reason I didn't do it myself is because I had docs to write still |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beyond the question whether things are c:type
or c:function
, I think you answered/dealt with all my queries, thanks!
I missed a doc build warning which the last commit fixes, along with a reorganization that moves some old docs into a place that makes more sense, allowing me to add a new section heading that I can link to from another page. |
@seberg did you want me to ping the mailing list before we merge this? or only after to alert everyone that it got merged? |
I thought it's more useful aftr as the docs are build? |
I could link to the doc build for this PR: https://output.circle-artifacts.com/output/job/39bc55ca-c9e3-42e5-940d-6224c4c903c8/artifacts/0/doc/build/html/reference/c-api/array.html#arraymethod-api |
My 2¢: merge and ask for comments then - even a major refactoring would at this stage be better done in a different PR! |
p.s. Probably good to note in any message to the mailing list that the new dtypes underlie the new |
OK, fair enough, let's hit the merge button to unblock the other string PRs. |
API: Expose the dtype C API
This exposes the DType API publicly. The functions are exposed using the codegen like the rest of the numpy API. The types were not straightofward to expose using the codegen, so I did it manually, keeping a separate PyCapsule containing a separate API table. Both the main array API table and the new dtype API table are exposed using
import_array()
.