ENH: Implement string comparison ufuncs (or almost) #21716
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of #21041.
This makes all comparison operators and ufuncs work on strings
using the ufunc machinery.
It requires a half-manual "ufunc" to keep supporting void comparisons
and especially
np.compare_chararrays
(that one may have a bit moreoverhead now).
In general the new code should be much faster, and has a lot of easier
optimization potential. It is also much simpler since it can outsource
some complexities to the ufunc/iterator machinery.
This further fixes a couple of bugs with byte-swapped strings.
The backward compatibility related change is that using the normal
ufunc machinery means that string comparisons between string and
unicode now give a
FutureWarning
(instead of just False).C++ does not like it (at least not before C++20)... GCC and clang
don't seem to mind, but MSVC seems to.
BENCH: Add basic string comparison benchmarks
DOC,STY: Fixup string-comparisons comments based on review
Thanks to Marten's comments, a few clarfications and slight fixups.
ENH: Use
memcmp
because it may be faster for the byte caseTST: Improve string and unicode comparison tests.
MAINT: Use switch statement based on review
As suggested be Serge.
Co-authored-by: Serge Guelton serge.guelton@telecom-bretagne.eu
The issue is that the
view
needs to use native byte-order, sojust ensure native byte-order for the view, and then do another cast
to get it right.
BUG: Add
np.compare_chararrays
to test and fix typoTST: Add test for empty string comparisons
TST: Fixup string test based on martens review
MAINT: Move definitions back into string_ufuncs.h
MAINT: Use enum class for comparison operator templating
This removes the need for a dynamic (or static) assert in the
switch statement.
Template version of add_loop to avoid redundant code
STY: Fixup style, two spaces, error is -1
STY: Small
string_ufuncs.cpp
fixups based on Serge's reviewMAINT: Fix merge conflict (ensure_dtype_nbo was removed)
Co-authored-by: Serge Guelton serge.guelton@telecom-bretagne.eu