Skip to content

Conversation

ngoldbaum
Copy link
Member

Fixes #28269.

It turns out test_scalars_string_conversion was testing the old buggy conversion 🙃.

Is it maybe problematic to assume the bytes are UTF-8? Before we were doing something completely nonsensical so we're free to make a choice here. I think the built-in NumPy bytes dtype assumes everything is ASCII, which is maybe less useful than letting people pass in arbitrary UTF-8?

We could also probably do this faster without going through the Python C API but that can be a future pass if anyone notices.

@ngoldbaum ngoldbaum added the component: numpy.strings String dtypes and functions label Feb 4, 2025
@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Feb 5, 2025
@charris
Copy link
Member

charris commented Feb 5, 2025

Rerunning the linux build worked. I don't know what hypothesis found that caused the error in the first run, but I doubt it is anything introduced by this PR.

@charris charris merged commit b3d045b into numpy:main Feb 5, 2025
68 checks passed
@charris
Copy link
Member

charris commented Feb 5, 2025

Thanks Nathan.

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: numpy.strings String dtypes and functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Inconsistent conversion of object-type bytes array to StringDType
2 participants