-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: fixes for three related stringdtype issues #26436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hmm, looks like this breaks some other things, will investigate and update. |
8ebdc7a
to
f7a20cc
Compare
Should be good now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, changes look good, I am not sure the has-ref logic is quite spot on and if we should refine it.
Adding that comment would be nice.
Also thanks so much for the careful review! I completely missed the issue with the strides. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the follow-ups, I like them!
* BUG: fix broken fancy indexing for stringdtype * BUG: fix incorrect casting for stringdtype in PyArray_Where * BUG: ensure casting to bool and nonzero treats null strings the same way * MNT: refactor so itemsizes are correct * MNT: refactor so trivial copy check aligns with casting setup
BUG: fixes for three related stringdtype issues (#26436)
Fixes #26420.
While working on the issue with
where
found in #26240, I noticed two other related issues. Since the fix for where depends on the other two fixes, I figured it would make sense to send them all as one PR.First, it turns out that advanced indexing was broken for indexing any entry in an array that needs a heap allocation. On current main, this leads to errors or segfaults:
The fix is to properly set the output descriptor in the casting setup in
array_subscript
.Second, I noticed that the
nonzero
function completely ignores nulls. I also noticed that the string to bool cast assumed all nulls are truthy and ignored the existence of nulls likeNone
that should be falsey, following the behavior of object array:This updates both
nonzero
and the string to bool cast to account for this and makes sure they behave identically.Finally, the issue with
where
is also caused by not setting the input and output descriptors properly in the cast setup inPyArray_Where
. The casting code here dates back to #23770, which was before stringdtype had an arena allocator. With the arena allocator, we need to be more careful about bookkeeping on input and output descriptors. This means we also need to setup a separate cast for the second input descriptor.Also adds tests for all three fixed issues.