-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
MAINT: Refactor stringdtype casts.c to use cpp templates #28091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: Refactor stringdtype casts.c to use cpp templates #28091
Conversation
220615b
to
d15e583
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If anyone else wants to look over this carefully, I strongly recommend the side-by-side diff view.
Most of these comments are formatting related. Overall this looks like a very nice improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peytondmurray the string to longdouble cast is based on NumPyOS_ascii_strtold
, which doesn't need the GIL. That said, the uses of PyErr_Warn
in that function are probably incorrect without the GIL, and we'd need a warning equivalent of npy_gil_error
to re-acquire the GIL and raise the warning to fix that.
That said, it occurs to me that the other string to float casts could probably all use NumPyOS_ascii_strtod
, and that would probably be a lot faster than creating a python float for each entry.
If you're not feeling up to fixing that stuff, I think just adding NEEDS_PYAPI
to the string to longdouble cast is fine and adding a TODO is fine.
Okay, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few more comments but this is looking much closer.
Maybe @mhvk is interested in giving this a look? Or maybe @lysnikolaou or @seberg?
I'm very aware this is a 1000-line diff and no worries if you're busy and this is too much.
Heh, it turns out a user just reported a bug that is way easier to fix once this PR lands since there's no need to mess with macros: #28157. The issue comes down to the float to string casts not accounting for the case where the float value is exactly the na_object of the destination array. I think it's better to land the fix for that issue after this lands - this is already huge and is getting close to being mergeable. |
Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
61cd0a1
to
0d46409
Compare
Looks like the 32-bit build broke as a result of the |
Okay, I can't see exactly where the root of the problem lies, but I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, no thorough review but I'll trust Nathan. I noticed memory errors not being handled right. If that is just too much here, let's overlook it (I think it was also wrong in the old code).
I am suspecting that it would be helpful to refactor this more to have a:
template <typename to_type, maybe-enable-if-magic>
struct CastSpec {
PyArrayMethod_Spec spec;
...
}
so that you can group the initializer function with data (such as the name) so that cleanup can also happen there...
Can't pass that out unfortunately if dtype.c
isn't also .cpp
so cleanup would have to jump through hoops.
Of course that would double-down on full-fledged C++, but as we see here there are some hoops that are just annoying in the half state this is in.
However, I think it is probably even better if the "refactor this more" is a follow-up (if it happens). The diff here is already big enough.
casts[cast_i++] = StringToVoidCastSpec; | ||
casts[cast_i++] = VoidToStringCastSpec; | ||
casts[cast_i++] = StringToBytesCastSpec; | ||
casts[cast_i++] = BytesToStringCastSpec; | ||
casts[cast_i++] = NULL; | ||
|
||
// Check that every cast spec is valid | ||
for (int i = 0; i<num_casts; i++) { | ||
assert(casts[i] != NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If things are fully correct error-wise. Then this actually can happen due to OOM.
I dunno what to do about it, because fixing things seems quite awkward (at least right now). Maybe just return NULL
and put a comment that it leaks a bit of memory but this should never happen anyway.
I am also OK if you decide to just keep it as a bug. Since it is a crash that happens at import time and only if OoM, it isn't super concerning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that this can happen due to OOM, but if that's the case the error indicator should be set by getCastSpec
. So maybe the solution could be here to check the error indicator, and return NULL
if it's set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a pass over this. Overall a great improvement. Left a few comments.
Okay, I added a bunch of error handling for the places where allocations weren't being checked before. Let's make sure the tests pass, then I think we'll be good to go! |
I did one more pass with an eye towards resource management and error handling. I didn't spot any issues but I added some comments so future readers don't get tripped up by I'll merge this assuming all the tests pass. Thanks so much for seeing this through. |
Summary
Part of #25693.
This PR replaces several places where macros were being used to generate cast specs for the
vstring
type with c++ templates. No user-facing changes were made.Other changes
but in using
ss#nns
,PyExc_UnicodeEncodeError
expected a string and buffer length for the second and third arguments. I've changed this tosOnns
, to accept abytes
object containing the offending data.Notes
PyArray_DTypeMeta
structs. I'm not sure if there is a better way to do this, but if you have any ideas about this I'd be grateful to learn how this could be done more elegantly (or if these would make sense to have in a different place in the code - I could see them being useful in more places than just this...):typenum_to_shortname_cstr
- I realize not all NPY_TYPES have corresponding shortnames, but these were the ones necessary to finish this efforttypenum_to_cstr
typenum_to_dtypemeta
char
can't be initialized with string literalssize_t
|
, e.g.NPY_METH_NO_FLOATINGPOINT_ERRORS | NPY_METH_REQUIRES_PYAPI
is not of typeNPY_ARRAYMETHOD_FLAGS
, it is of typeint
. Explicit casts were added to ensure the correct typegoto
was jumping over a variable initialization, which isn't allowed by the cpp compiler. The instance was modified to work without the jump.@ngoldbaum I noticed that the string to f128 cast has only the
NPY_METH_NO_FLOATINGPOINT_ERRORS
array method flag. But if you look atstring_to_float
, orstring_to_float_resolve_descriptors
, it looks like there are lines that are accessing python objects, callingPy_INCREF
, etc. I've kept the array flags identical to how they were before, but should this also have aNPY_METH_REQUIRES_PYAPI
flag as well?