BUG: fixes for three related stringdtype issues (#26436) #26459
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of #26436.
Fixes #26420.
While working on the issue with
where
found in #26240, I noticed two other related issues. Since the fix for where depends on the other two fixes, I figured it would make sense to send them all as one PR.First, it turns out that advanced indexing was broken for indexing any entry in an array that needs a heap allocation. On current main, this leads to errors or segfaults:
The fix is to properly set the output descriptor in the casting setup in
array_subscript
.Second, I noticed that the
nonzero
function completely ignores nulls. I also noticed that the string to bool cast assumed all nulls are truthy and ignored the existence of nulls likeNone
that should be falsey, following the behavior of object array:This updates both
nonzero
and the string to bool cast to account for this and makes sure they behave identically.Finally, the issue with
where
is also caused by not setting the input and output descriptors properly in the cast setup inPyArray_Where
. The casting code here dates back to #23770, which was before stringdtype had an arena allocator. With the arena allocator, we need to be more careful about bookkeeping on input and output descriptors. This means we also need to setup a separate cast for the second input descriptor.Also adds tests for all three fixed issues.
BUG: fix broken fancy indexing for stringdtype
BUG: fix incorrect casting for stringdtype in PyArray_Where
BUG: ensure casting to bool and nonzero treats null strings the same way
MNT: refactor so itemsizes are correct
MNT: refactor so trivial copy check aligns with casting setup