-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Implement string comparison ufuncs (or almost) #21041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
NPY_NO_EXPORT PyObject * | ||
_umath_strings_richcompare( | ||
PyArrayObject *self, PyArrayObject *other, int cmp_op, int rstrip); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this how we do this for C++? I am not sure which header I could cram this in so that i can just include it from multiarray/multiarrayobject.c
and multiarray/arrayobject.c
which need access to these. Currently I put it there directly, which is OK, but I don't like it too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not in an string_ufuncs.h
?
@serge-sans-paille could you vet the C++ again? Maybe you also know how to reduce those annoying repetitions when creating the many different loops? That would be awesome :). EDIT: MSVC doesn't like those named struct initializers? C++ 11 really doesn't allow/cover this simple C99 feature?! |
c549c66
to
26f0be6
Compare
NPY_ITER_WRITEONLY | NPY_ITER_ALLOCATE | NPY_ITER_ALIGNED}; | ||
|
||
PyArrayMethod_StridedLoop *strided_loop = nullptr; | ||
NPY_BEGIN_THREADS_DEF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ forced me to move all the inits up, or it didn't want the goto, another C99 feature not included in C11?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if flagging these variables as constexpr
changes something wrt. goto
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tried for the flags, and it didn't seem to work.
26f0be6
to
c7dbb38
Compare
Windows is not happy. Do we have benchmarks that are affected by this change? |
fd098d9
to
aa9a9f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg - I had a look mostly just because I was curious both how your new dtypes held up here and how the new templating works. Both seem great! My comments are really just about trying to make comments clearer; you probably want one of the more c++ people to have a look.
@@ -214,7 +214,7 @@ typedef struct { | |||
} PyArrayMethod_Spec; | |||
|
|||
|
|||
typedef PyObject *_ufunc_addloop_fromspec_func( | |||
typedef int _ufunc_addloop_fromspec_func( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change seems unrelated to the refactoring part and probably deserves a commit on its own
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, although I don't care about when/how it gets in, hehe
/* From umath/string_ufuncs.cpp/h */ | ||
NPY_NO_EXPORT PyObject * | ||
_umath_strings_richcompare( | ||
PyArrayObject *self, PyArrayObject *other, int cmp_op, int rstrip); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not including the according header instead of forward-declaring that function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There be dragons... I am not quite sure how big they are but there are difficulties with crossing the multiarray/umath boundary.
template <typename character> | ||
static NPY_INLINE int | ||
character_cmp(character a, character b) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could that be return static_cast<int>(b) - static_cast<int>(a)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, although I am not sure in case of invalid unicode characters (they are probably unsigned, but that would be out of range). Happy to change if think it makes it much clearer? (I don't have an opinion either way.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, maybe something other than an in is better as typename std::intermediate_type<npy_ucs4, npy_byte>::type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly still don't really see that this will lead to easier to read code :), and I assume the compilers should optimize it just fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite, and that's why I'm trying to convince you :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but that is not what we are comparing? We are comparing these:
https://godbolt.org/z/f5qh9a3b7
And there func1
func0
actually ends up shorter in some cases (depends on the operand)?
EDIT: sorry, I am silly, no since this first is applies on the full word, and only then to an integer. So I guess it may compile to something faster, hmmmm.
else if (comp == Py_GE) { | ||
res = cmp >= 0; | ||
} | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as comp
is a template parameter, this could be a static_assert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought so first, but my linter didn't like static_assert(0)
? But probably just a bad linting hint, not fully understanding the templating, will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Edit: oops different window, so missed my own comment ;))
I just tried it again, and even the compiler complained, not just the linter?
NPY_NO_EXPORT PyObject * | ||
_umath_strings_richcompare( | ||
PyArrayObject *self, PyArrayObject *other, int cmp_op, int rstrip); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not in an string_ufuncs.h
?
NPY_ITER_WRITEONLY | NPY_ITER_ALLOCATE | NPY_ITER_ALIGNED}; | ||
|
||
PyArrayMethod_StridedLoop *strided_loop = nullptr; | ||
NPY_BEGIN_THREADS_DEF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if flagging these variables as constexpr
changes something wrt. goto
?
aa9a9f1
to
623cea5
Compare
IMO, the main point here is making the string comparisons sane and adding them as ufuncs to The benchmarks proof my point that this is generally much faster:(This time, smaller ratio is better)
|
@seberg - those are very impressive improvements!! I'm rather puzzled by this, though:
How is it possible that '!=' is 50% slower than '=='?? I would have expected the two to be identical (as was the case in the old code). And indeed, all the other comparisons are the same (within uncertainties) as '=='. The same 50% penalty holds when not using
As well as for the 2-d case. |
Hmmm, no idea, without thinking about it, I would have assumed that all comparisons will have the same speed. Is there some reason the compiler can optimize some path more than the other, or just random? |
The timing differences are very systematic: all comparisons are the same except '!=' for 'U'. I cannot see any reason for it in your code; after all, the base function doing the comparisons does not even know what comparison is being done; it just returns -1, 0, or +1. Maybe it is some kind of weird fluke? |
I.e., maybe worth running the benchmarks again? |
Its reproducible on my comp, also if I change the order in which the benchmarks run. The compiler must be spitting something different for |
Weird, I just thought I'd try to reproduce it too, and I cannot:
(For both, current master gives 3.46 ms, so I confirm the great speed-up! I also tried with |
To be honest, I currently don't care much about this oddness related to the core of the loop right now. Could be compiler version or even hardware differences (gcc 11.2 and a Ryzen 3 here) 🤷. If anyone to squeeze performance, I would suggest implementing the |
@seberg - agreed that in the larger scheme of things this just doesn't matter. The code is much nicer and faster, what else could one ask for!? |
OK, I think I covered all the important paths, so this should be ready now for another review and generally ready modulo smaller header or other cleanups. (We also really need to deprecate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Mostly just a reiteration of @serge-sans-paille's suggestion to create an 'string_ufuncs.h', plus a small nit on one of the tests.
Though I'd prefer a more C++ adept person to sign off on that part!
} | ||
|
||
|
||
extern "C" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like below: I think it makes sense to create a 'strings_ufunc.h' that contains these definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @mattip, you have a good idea? I had that first, and it should work for the init_string_ufuncs
, but the umath/multiarray boundary is always creating some headaches: it seems hard to include that header from multiarray
somehow?
I admit, it is very likely I am just doing something stupid.
Also @serge-sans-paille, I thought another problem was that the symbol export didn't actually work right when I had it as its own header here. But I guess that may have been just a bad try?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe good to separate what should more obviously work, which is to put the ones double-defined here in 'strings_ufunc.h' and import that new .h above. And only then worry about the other places.
Regarding multiarray/umath, I thought we had tried to harmonize those, but not sure...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are now one build-unit, which is why this works at all without much hassle. But including umath stuff in multiarray is still tricky (I am not exactly sure why)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, moved things back, it is working fine, but did not try to cross the multiarray
boundary yet. I guess one thing could be that I did not actually include other stuff in the header right now (not sure if I should).
template <typename character> | ||
static NPY_INLINE int | ||
character_cmp(character a, character b) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, maybe something other than an in is better as typename std::intermediate_type<npy_ucs4, npy_byte>::type
res = cmp >= 0; | ||
break; | ||
default: | ||
assert(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the right static_assert
here would be (in the function prelude`
static_assert(Py_EQ == comp || Py_NE == comp || ..., "valid comparison");
or something based on a range if you know for sure which enumerator is the lowest one.
case Py_GE: | ||
return string_comparison_loop<rstrip, Py_GE, character>; | ||
default: | ||
assert(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, seeing the same assert again, what you probably should do is
static constexpr is_valid_comp(int comp) {
return comp == Py_LT || ...;
}
and use it either in fonction prelude, either as an assert
or a static_assert
depending on the context
Or even better: make the int an enum class
but that's probably not possible, just mentioning it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I didn't love the idea of a long duplicate Py_EQ || ...
where the asserts were, but if it goes into the function prelude, I can see the point, will adopt :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang static analysis seems to not be quite smart enough to realize that the switch will always match, but I guess I will ignore that (it warns that res
may be undefined in the other case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I could use default
for the last one, but dunno.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, silly me... This assert is actually not static at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the verboseness. Pushed a change with that the enum class instead, figured that the static assert was just as verbose anyway...
62cbfc3
to
7611b3d
Compare
@seberg : feel free to cherry-pick serge-sans-paille@0bf1a47 |
Done that cherry pick, but also an STY commit because some style stuff I just can't agree with ;). Left some long lines, wasn't sure how to break them, and I think they are somewhat fine (i.e. nothing non-obvious hidden), although not great. I am happy with this, it is nice that it removes the duplication, it adds a bit of "C++" knowledge requirement to truly understand what is going on, but I guess that is OK. |
1b729c0
to
11d91ac
Compare
@seberg, @charris, just "a gentle hi". Recently I want to contribute a PR about string ufunc in Numba (numba/numba#8104), which is empty PR right now. I discussed with Numba members, and we hope to keep consistent with NumPy handling way. And I saw this PR! Very impressive and we need it (especially |
* ENH: Implement string comparison ufuncs (or almost) This makes all comparison operators and ufuncs work on strings using the ufunc machinery. It requires a half-manual "ufunc" to keep supporting void comparisons and especially `np.compare_chararrays` (that one may have a bit more overhead now). In general the new code should be much faster, and has a lot of easier optimization potential. It is also much simpler since it can outsource some complexities to the ufunc/iterator machinery. This further fixes a couple of bugs with byte-swapped strings. The backward compatibility related change is that using the normal ufunc machinery means that string comparisons between string and unicode now give a `FutureWarning` (instead of just False). * MAINT: Do not use C99 tagged struct init in C++ C++ does not like it (at least not before C++20)... GCC and clang don't seem to mind, but MSVC seems to. * BENCH: Add basic string comparison benchmarks * DOC,STY: Fixup string-comparisons comments based on review Thanks to Marten's comments, a few clarfications and slight fixups. * ENH: Use `memcmp` because it may be faster for the byte case * TST: Improve string and unicode comparison tests. * MAINT: Use switch statement based on review As suggested be Serge. Co-authored-by: Serge Guelton <serge.guelton@telecom-bretagne.eu> * TST: Make unicode byte-swap test slightly more concrete The issue is that the `view` needs to use native byte-order, so just ensure native byte-order for the view, and then do another cast to get it right. * BUG: Add `np.compare_chararrays` to test and fix typo * TST: Add test for empty string comparisons * TST: Fixup string test based on martens review * MAINT: Move definitions back into string_ufuncs.h * MAINT: Use enum class for comparison operator templating This removes the need for a dynamic (or static) assert in the switch statement. * Template version of add_loop to avoid redundant code * STY: Fixup style, two spaces, error is -1 * STY: Small `string_ufuncs.cpp` fixups based on Serge's review * MAINT: Fix merge conflict (ensure_dtype_nbo was removed) Co-authored-by: Serge Guelton <serge.guelton@telecom-bretagne.eu>
The backport is in #21716. |
Oh we are giving a backport a shot? Will there be another (brief) rc? I don't really expect this to create issues, but it might have some very subtle changes. I don't mind backporting, I do wonder though. @dlee992 are you interested in NumPy supporting |
Yes, there are already > 10 backports since rc2. And then I just want to sit on it for two weeks or so and stop doing that :) The zero release is the true NumPy rc, since that is when widespread testing really happens. |
If @dlee992 decides that a backport will not serve, the we can close the open PR for that. |
@seberg, hi. I think I just need NumPy supporting
@charris, hi, I need the very backport. I am glad it can be backported to 1.23.0. Thanks for ur quick response very much! |
@dlee992 We should be able to expand the output of Note that |
@seberg, maybe I can try to follow these parts of NumPy code, and try to extend
I'd guess these related stuff are implemented in |
Of course, the problem is timing. If we really need this in the next release we have to do it super-fast and convince Chuck that backporting matters. It is probably not hard, but it seems a bit steep to me.
This happens in A "workaround" might be to add a "legacy" loop that is never used and just does nothing/sets an error if used. But that is pretty annoying. |
@seberg, many thanks! But in fact, I cannot follow ur thought about extending PS, I think I am not that hurry about extending Thanks again! |
If there is consensus that returning two extra entries in We are now in the third rc for 1.23, so I would not expect many more changes unless something comes up. In the end it is up to Chuck though. The following is what returns numpy/numpy/core/src/umath/ufunc_object.c Lines 6336 to 6368 in 703a6fa
As you can see it reads
|
@dlee992 for your awareness. It seems we should (and will) pull this out of NumPy 1.23. It leads to subtle changes for subclasses (subclasses were sometimes ignored before). That leads to subtle changes/breaks in MaskedArrays paths (that were arguably always a bit bad). And those breaks domino into failures for |
@seberg, thanks! I will stay tuned about this ENH. |
This makes all comparison operators and ufuncs work on strings
using the ufunc machinery.
It requires a half-manual "ufunc" to keep supporting void comparisons
and especially
np.compare_chararrays
(that one may have a bit moreoverhead now).
In general the new code should be much faster, and has a lot of easier
optimization potential. It is also much simpler since it can outsource
some complexities to the ufunc/iterator machinery.
This further fixes a couple of bugs with byte-swapped strings.
The backward compatibility related change is that using the normal
ufunc machinery means that string comparisons between string and
unicode now give a
FutureWarning
(instead of just False).This also shows two things:
I do admit, that I think we will have to refactor some of this when we get more ufuncs created in this way. Note that for now
np.equal.types
will not listSS->?
andUU->?
.In general, this is finished, however, it should get a bit of new tests for direct
np.equal
, etc. ufunc use and for byte-swapped or unaligned unicode inputs.