ENH: Implement string comparison ufuncs (or almost) #21041

seberg · 2022-02-12T22:27:10Z

This makes all comparison operators and ufuncs work on strings
using the ufunc machinery.
It requires a half-manual "ufunc" to keep supporting void comparisons
and especially np.compare_chararrays (that one may have a bit more
overhead now).

In general the new code should be much faster, and has a lot of easier
optimization potential. It is also much simpler since it can outsource
some complexities to the ufunc/iterator machinery.

This further fixes a couple of bugs with byte-swapped strings.

The backward compatibility related change is that using the normal
ufunc machinery means that string comparisons between string and
unicode now give a FutureWarning (instead of just False).

This also shows two things:

How to use new-style loops (very simple one)
Writes them in C++ templating.

I do admit, that I think we will have to refactor some of this when we get more ufuncs created in this way. Note that for now np.equal.types will not list SS->? and UU->?.

In general, this is finished, however, it should get a bit of new tests for direct np.equal, etc. ufunc use and for byte-swapped or unaligned unicode inputs.

seberg · 2022-02-12T22:29:28Z

numpy/core/src/umath/string_ufuncs.cpp

+    NPY_NO_EXPORT PyObject *
+    _umath_strings_richcompare(
+            PyArrayObject *self, PyArrayObject *other, int cmp_op, int rstrip);
+}


Is this how we do this for C++? I am not sure which header I could cram this in so that i can just include it from multiarray/multiarrayobject.c and multiarray/arrayobject.c which need access to these. Currently I put it there directly, which is OK, but I don't like it too much.

Why not in an string_ufuncs.h ?

seberg · 2022-02-12T22:30:48Z

@serge-sans-paille could you vet the C++ again? Maybe you also know how to reduce those annoying repetitions when creating the many different loops? That would be awesome :).

EDIT: MSVC doesn't like those named struct initializers? C++ 11 really doesn't allow/cover this simple C99 feature?!
EDIT2: Oh well, expanding it into multiple-lines rather than the {} is not terrible...

numpy/core/src/multiarray/convert_datatype.h

seberg · 2022-02-12T23:25:24Z

numpy/core/src/umath/string_ufuncs.cpp

+            NPY_ITER_WRITEONLY | NPY_ITER_ALLOCATE | NPY_ITER_ALIGNED};
+
+    PyArrayMethod_StridedLoop *strided_loop = nullptr;
+    NPY_BEGIN_THREADS_DEF;


C++ forced me to move all the inits up, or it didn't want the goto, another C99 feature not included in C11?

I wonder if flagging these variables as constexpr changes something wrt. goto ?

Just tried for the flags, and it didn't seem to work.

mattip · 2022-02-13T06:34:43Z

Windows is not happy.

Do we have benchmarks that are affected by this change?

mhvk

@seberg - I had a look mostly just because I was curious both how your new dtypes held up here and how the new templating works. Both seem great! My comments are really just about trying to make comments clearer; you probably want one of the more c++ people to have a look.

numpy/core/src/umath/string_ufuncs.cpp

serge-sans-paille · 2022-02-13T18:47:11Z

numpy/core/include/numpy/experimental_dtype_api.h

@@ -214,7 +214,7 @@ typedef struct {
 } PyArrayMethod_Spec;


-typedef PyObject *_ufunc_addloop_fromspec_func(
+typedef int _ufunc_addloop_fromspec_func(


This change seems unrelated to the refactoring part and probably deserves a commit on its own

True, although I don't care about when/how it gets in, hehe

numpy/core/src/multiarray/convert_datatype.h

serge-sans-paille · 2022-02-13T18:49:58Z

numpy/core/src/multiarray/multiarraymodule.c

+/* From umath/string_ufuncs.cpp/h */
+NPY_NO_EXPORT PyObject *
+_umath_strings_richcompare(
+        PyArrayObject *self, PyArrayObject *other, int cmp_op, int rstrip);


why not including the according header instead of forward-declaring that function?

There be dragons... I am not quite sure how big they are but there are difficulties with crossing the multiarray/umath boundary.

serge-sans-paille · 2022-02-13T18:53:23Z

numpy/core/src/umath/string_ufuncs.cpp

+template <typename character>
+static NPY_INLINE int
+character_cmp(character a, character b)
+{


could that be return static_cast<int>(b) - static_cast<int>(a) ?

Yeah, although I am not sure in case of invalid unicode characters (they are probably unsigned, but that would be out of range). Happy to change if think it makes it much clearer? (I don't have an opinion either way.)

yeah, maybe something other than an in is better as typename std::intermediate_type<npy_ucs4, npy_byte>::type

I honestly still don't really see that this will lead to easier to read code :), and I assume the compilers should optimize it just fine?

Not quite, and that's why I'm trying to convince you :-)

https://godbolt.org/z/GMWPKE4EK

Yeah, but that is not what we are comparing? We are comparing these:
https://godbolt.org/z/f5qh9a3b7

And there ~~func1~~ func0 actually ends up shorter in some cases (depends on the operand)?

EDIT: sorry, I am silly, no since this first is applies on the full word, and only then to an integer. So I guess it may compile to something faster, hmmmm.

numpy/core/src/umath/string_ufuncs.cpp

serge-sans-paille · 2022-02-13T18:58:39Z

numpy/core/src/umath/string_ufuncs.cpp

+        else if (comp == Py_GE) {
+            res = cmp >= 0;
+        }
+        else {


as comp is a template parameter, this could be a static_assert

Yeah, I thought so first, but my linter didn't like static_assert(0)? But probably just a bad linting hint, not fully understanding the templating, will change.

(Edit: oops different window, so missed my own comment ;))

I just tried it again, and even the compiler complained, not just the linter?

numpy/core/src/umath/string_ufuncs.cpp

serge-sans-paille · 2022-02-13T19:21:36Z

numpy/core/src/umath/string_ufuncs.cpp

+    NPY_NO_EXPORT PyObject *
+    _umath_strings_richcompare(
+            PyArrayObject *self, PyArrayObject *other, int cmp_op, int rstrip);
+}


Why not in an string_ufuncs.h ?

serge-sans-paille · 2022-02-13T19:22:48Z

numpy/core/src/umath/string_ufuncs.cpp

+            NPY_ITER_WRITEONLY | NPY_ITER_ALLOCATE | NPY_ITER_ALIGNED};
+
+    PyArrayMethod_StridedLoop *strided_loop = nullptr;
+    NPY_BEGIN_THREADS_DEF;


I wonder if flagging these variables as constexpr changes something wrt. goto ?

seberg · 2022-02-13T21:03:33Z

IMO, the main point here is making the string comparisons sane and adding them as ufuncs to np.equal, etc. to begin with. But benchmarks showed a slightly different performance "profile" for S with more characters and when things match. I am not convinced that it is actually better how it was before (unless we anticipate long strings), but for now changed it, more than happy to revert though!

The benchmarks proof my point that this is generally much faster:

(This time, smaller ratio is better)

       before           after         ratio
     [d9538400]       [623cea57]
     <main>           <string-ufuncs>
+         104±1μs        118±0.4μs     1.14  bench_strings.StringComparisons.time_compare_identical(10000, 'U', True, '!=')
+      54.6±0.9μs      60.3±0.09μs     1.10  bench_strings.StringComparisons.time_compare_identical(10000, 'U', False, '!=')
+         223±7μs          237±2μs     1.06  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'U', True, '!=')
-       131±0.3μs        122±0.7μs     0.94  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'U', False, '!=')
-     1.27±0.01μs      1.18±0.02μs     0.93  bench_strings.StringComparisons.time_compare_identical(100, 'U', False, '!=')
-     1.79±0.01μs      1.40±0.01μs     0.78  bench_strings.StringComparisons.time_compare_identical(100, 'U', True, '<=')
-     1.30±0.03μs      1.01±0.01μs     0.78  bench_strings.StringComparisons.time_compare_identical(100, 'U', False, '<=')
-     1.29±0.02μs         999±10ns     0.78  bench_strings.StringComparisons.time_compare_identical(100, 'U', False, '==')
-      53.8±0.5μs       41.7±0.3μs     0.78  bench_strings.StringComparisons.time_compare_identical(10000, 'U', False, '>=')
-     1.80±0.01μs      1.40±0.01μs     0.77  bench_strings.StringComparisons.time_compare_identical(100, 'U', True, '>=')
-        1.28±0μs         991±20ns     0.77  bench_strings.StringComparisons.time_compare_identical(100, 'U', False, '>')
-     1.78±0.01μs      1.38±0.01μs     0.77  bench_strings.StringComparisons.time_compare_identical(100, 'U', True, '>')
-     1.28±0.01μs          984±4ns     0.77  bench_strings.StringComparisons.time_compare_identical(100, 'U', False, '>=')
-        1.27±0μs          975±2ns     0.77  bench_strings.StringComparisons.time_compare_identical(100, 'U', False, '<')
-        54.3±2μs       41.6±0.3μs     0.77  bench_strings.StringComparisons.time_compare_identical(10000, 'U', False, '<=')
-         970±8ns          740±3ns     0.76  bench_strings.StringComparisons.time_compare_identical(100, 'S', False, '>')
-       213±0.4μs          162±2μs     0.76  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'U', True, '>=')
-        54.1±2μs       41.1±0.3μs     0.76  bench_strings.StringComparisons.time_compare_identical(10000, 'U', False, '>')
-         104±1μs       79.0±0.3μs     0.76  bench_strings.StringComparisons.time_compare_identical(10000, 'U', True, '<=')
-     1.22±0.01μs          926±4ns     0.76  bench_strings.StringComparisons.time_compare_identical(100, 'S', True, '>=')
-      52.9±0.5μs       40.1±0.5μs     0.76  bench_strings.StringComparisons.time_compare_identical(10000, 'U', False, '==')
-         976±5ns         739±20ns     0.76  bench_strings.StringComparisons.time_compare_identical(100, 'S', False, '<=')
-     54.2±0.06μs       40.7±0.1μs     0.75  bench_strings.StringComparisons.time_compare_identical(10000, 'U', False, '<')
-        1.81±0μs      1.36±0.01μs     0.75  bench_strings.StringComparisons.time_compare_identical(100, 'U', True, '==')
-         106±6μs       79.5±0.6μs     0.75  bench_strings.StringComparisons.time_compare_identical(10000, 'U', True, '>=')
-        984±20ns          738±5ns     0.75  bench_strings.StringComparisons.time_compare_identical(100, 'S', False, '>=')
-       106±0.5μs       79.2±0.4μs     0.75  bench_strings.StringComparisons.time_compare_identical(10000, 'U', True, '>')
-         104±1μs       77.7±0.6μs     0.75  bench_strings.StringComparisons.time_compare_identical(10000, 'U', True, '==')
-        966±10ns          719±4ns     0.74  bench_strings.StringComparisons.time_compare_identical(100, 'S', False, '==')
-     1.85±0.02μs      1.36±0.02μs     0.74  bench_strings.StringComparisons.time_compare_identical(100, 'U', True, '<')
-       966±0.7ns          711±2ns     0.74  bench_strings.StringComparisons.time_compare_different(100, 'S', False, '<=')
-        1.23±0μs          901±6ns     0.73  bench_strings.StringComparisons.time_compare_identical(100, 'S', True, '>')
-        982±10ns          721±3ns     0.73  bench_strings.StringComparisons.time_compare_different(100, 'S', False, '>')
-         108±1μs      78.8±0.06μs     0.73  bench_strings.StringComparisons.time_compare_identical(10000, 'U', True, '<')
-     1.19±0.03μs          873±8ns     0.73  bench_strings.StringComparisons.time_compare_identical(100, 'S', True, '<')
-     1.22±0.01μs          894±6ns     0.73  bench_strings.StringComparisons.time_compare_identical(100, 'S', True, '<=')
-         974±7ns          713±2ns     0.73  bench_strings.StringComparisons.time_compare_different(100, 'S', False, '<')
-     1.20±0.02μs          874±6ns     0.73  bench_strings.StringComparisons.time_compare_different(100, 'S', True, '!=')
-         970±1ns          706±6ns     0.73  bench_strings.StringComparisons.time_compare_different(100, 'S', False, '!=')
-         970±9ns          705±5ns     0.73  bench_strings.StringComparisons.time_compare_different(100, 'S', False, '==')
-     1.22±0.02μs          887±4ns     0.72  bench_strings.StringComparisons.time_compare_different(100, 'S', True, '>=')
-     1.21±0.01μs          878±8ns     0.72  bench_strings.StringComparisons.time_compare_different(100, 'S', True, '<=')
-         977±8ns          707±4ns     0.72  bench_strings.StringComparisons.time_compare_identical(100, 'S', False, '<')
-     1.22±0.01μs          884±2ns     0.72  bench_strings.StringComparisons.time_compare_different(100, 'S', True, '>')
-         218±7μs          158±2μs     0.72  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'U', True, '<=')
-         218±7μs        158±0.9μs     0.72  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'U', True, '<')
-     1.23±0.01μs          890±4ns     0.72  bench_strings.StringComparisons.time_compare_different(100, 'S', True, '<')
-        996±30ns          718±2ns     0.72  bench_strings.StringComparisons.time_compare_different(100, 'S', False, '>=')
-         978±5ns          698±2ns     0.71  bench_strings.StringComparisons.time_compare_identical(100, 'S', False, '!=')
-     1.25±0.02μs          891±4ns     0.71  bench_strings.StringComparisons.time_compare_identical(100, 'S', True, '==')
-         228±1μs          160±1μs     0.70  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'U', True, '>')
-         226±1μs          158±1μs     0.70  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'U', True, '==')
-     1.23±0.03μs          859±5ns     0.70  bench_strings.StringComparisons.time_compare_identical(100, 'S', True, '!=')
-     1.24±0.04μs         850±10ns     0.68  bench_strings.StringComparisons.time_compare_different(100, 'S', True, '==')
-         115±7μs       78.1±0.2μs     0.68  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'S', True, '>=')
-      59.4±0.8μs      39.5±0.09μs     0.66  bench_strings.StringComparisons.time_compare_identical(10000, 'S', True, '>=')
-      31.2±0.1μs       20.1±0.6μs     0.65  bench_strings.StringComparisons.time_compare_identical(10000, 'S', False, '>=')
-         113±6μs       72.5±0.3μs     0.64  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'S', True, '==')
-       128±0.5μs         81.6±1μs     0.64  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'U', False, '>=')
-       130±0.8μs       82.6±0.9μs     0.64  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'U', False, '==')
-        53.9±3μs       34.0±0.2μs     0.63  bench_strings.StringComparisons.time_compare_different(10000, 'S', True, '<')
-       131±0.5μs       82.4±0.5μs     0.63  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'U', False, '<=')
-        55.1±5μs       34.6±0.3μs     0.63  bench_strings.StringComparisons.time_compare_different(10000, 'S', True, '<=')
-        58.4±1μs       36.5±0.1μs     0.63  bench_strings.StringComparisons.time_compare_identical(10000, 'S', True, '<=')
-         966±7ns          603±3ns     0.62  bench_strings.StringComparisons.time_compare_different(100, 'U', False, '<')
-       131±0.9μs       81.6±0.8μs     0.62  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'U', False, '>')
-        29.4±2μs       18.3±0.6μs     0.62  bench_strings.StringComparisons.time_compare_identical(10000, 'S', False, '>')
-        972±20ns          603±5ns     0.62  bench_strings.StringComparisons.time_compare_different(100, 'U', False, '!=')
-         131±2μs       81.1±0.6μs     0.62  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'U', False, '<')
-        59.3±2μs       36.6±0.3μs     0.62  bench_strings.StringComparisons.time_compare_identical(10000, 'S', True, '==')
-         954±6ns          586±2ns     0.61  bench_strings.StringComparisons.time_compare_different(100, 'U', False, '<=')
-        971±40ns          587±4ns     0.60  bench_strings.StringComparisons.time_compare_different(100, 'U', False, '>=')
-        977±10ns          590±1ns     0.60  bench_strings.StringComparisons.time_compare_different(100, 'U', False, '==')
-      27.3±0.7μs       16.5±0.2μs     0.60  bench_strings.StringComparisons.time_compare_different(10000, 'S', False, '<')
-        30.3±2μs       18.3±0.6μs     0.60  bench_strings.StringComparisons.time_compare_different(10000, 'S', False, '<=')
-         113±7μs       67.8±0.5μs     0.60  bench_strings.StringComparisons.time_compare_different((1000, 20), 'S', True, '>')
-         988±4ns          593±1ns     0.60  bench_strings.StringComparisons.time_compare_different(100, 'U', False, '>')
-        51.9±2μs         31.1±1μs     0.60  bench_strings.StringComparisons.time_compare_different(10000, 'S', True, '==')
-     31.2±0.09μs       18.6±0.5μs     0.60  bench_strings.StringComparisons.time_compare_identical(10000, 'S', False, '==')
-        59.6±4μs         35.4±2μs     0.59  bench_strings.StringComparisons.time_compare_identical(10000, 'S', True, '>')
-       122±0.2μs       72.3±0.2μs     0.59  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'S', True, '>')
-         116±6μs       68.2±0.2μs     0.59  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'S', True, '!=')
-      31.3±0.5μs       18.3±0.5μs     0.59  bench_strings.StringComparisons.time_compare_identical(10000, 'S', False, '<=')
-       122±0.6μs         71.1±3μs     0.58  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'S', True, '<=')
-      30.2±0.6μs       17.5±0.9μs     0.58  bench_strings.StringComparisons.time_compare_identical(10000, 'S', False, '<')
-        70.7±2μs      40.7±0.07μs     0.58  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'S', False, '>=')
-        1.21±0μs         697±20ns     0.57  bench_strings.StringComparisons.time_compare_different(100, 'U', True, '!=')
-         112±7μs         64.3±3μs     0.57  bench_strings.StringComparisons.time_compare_different((1000, 20), 'S', True, '>=')
-        59.5±1μs       33.8±0.2μs     0.57  bench_strings.StringComparisons.time_compare_identical(10000, 'S', True, '!=')
-         114±9μs         64.7±2μs     0.57  bench_strings.StringComparisons.time_compare_different((1000, 20), 'S', True, '!=')
-        30.3±1μs       17.1±0.5μs     0.57  bench_strings.StringComparisons.time_compare_different(10000, 'S', False, '!=')
-      31.0±0.7μs      17.5±0.08μs     0.56  bench_strings.StringComparisons.time_compare_identical(10000, 'S', False, '!=')
-         119±6μs         67.0±3μs     0.56  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'S', True, '<')
-        60.4±2μs       33.8±0.2μs     0.56  bench_strings.StringComparisons.time_compare_different(10000, 'S', True, '!=')
-        60.8±3μs       34.0±0.4μs     0.56  bench_strings.StringComparisons.time_compare_different(10000, 'S', True, '>=')
-         118±3μs         65.1±2μs     0.55  bench_strings.StringComparisons.time_compare_different((1000, 20), 'S', True, '<')
-        30.9±2μs       17.0±0.7μs     0.55  bench_strings.StringComparisons.time_compare_different(10000, 'S', False, '>')
-      31.1±0.2μs       17.0±0.6μs     0.55  bench_strings.StringComparisons.time_compare_different(10000, 'S', False, '>=')
-     1.22±0.01μs          667±2ns     0.55  bench_strings.StringComparisons.time_compare_different(100, 'U', True, '<')
-        1.23±0μs          668±5ns     0.54  bench_strings.StringComparisons.time_compare_different(100, 'U', True, '>=')
-     1.23±0.01μs          666±7ns     0.54  bench_strings.StringComparisons.time_compare_different(100, 'U', True, '==')
-     1.25±0.02μs         673±10ns     0.54  bench_strings.StringComparisons.time_compare_different(100, 'U', True, '<=')
-         120±3μs         64.6±3μs     0.54  bench_strings.StringComparisons.time_compare_different((1000, 20), 'S', True, '<=')
-     1.24±0.02μs          665±1ns     0.54  bench_strings.StringComparisons.time_compare_different(100, 'U', True, '>')
-        29.6±2μs       15.8±0.5μs     0.53  bench_strings.StringComparisons.time_compare_different(10000, 'S', False, '==')
-     25.3±0.08μs      13.1±0.07μs     0.52  bench_strings.StringComparisons.time_compare_different(10000, 'U', False, '>=')
-        74.1±3μs      38.0±0.07μs     0.51  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'S', False, '==')
-      25.2±0.3μs       12.9±0.1μs     0.51  bench_strings.StringComparisons.time_compare_different(10000, 'U', False, '>')
-      25.3±0.2μs      12.9±0.08μs     0.51  bench_strings.StringComparisons.time_compare_different(10000, 'U', False, '<=')
-     25.0±0.08μs       12.7±0.1μs     0.51  bench_strings.StringComparisons.time_compare_different(10000, 'U', False, '<')
-         117±3μs         59.1±3μs     0.50  bench_strings.StringComparisons.time_compare_different((1000, 20), 'S', True, '==')
-      25.0±0.1μs       12.5±0.2μs     0.50  bench_strings.StringComparisons.time_compare_different(10000, 'U', False, '==')
-        75.7±3μs         37.4±1μs     0.49  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'S', False, '>')
-      25.6±0.6μs       12.6±0.1μs     0.49  bench_strings.StringComparisons.time_compare_different(10000, 'U', False, '!=')
-        72.7±2μs         34.2±1μs     0.47  bench_strings.StringComparisons.time_compare_different((1000, 20), 'S', False, '>=')
-      75.8±0.4μs      35.3±0.05μs     0.47  bench_strings.StringComparisons.time_compare_different((1000, 20), 'S', False, '<=')
-        75.7±2μs       35.1±0.1μs     0.46  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'S', False, '<=')
-        76.0±2μs       34.9±0.1μs     0.46  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'S', False, '<')
-        76.4±2μs       35.1±0.2μs     0.46  bench_strings.StringComparisons.time_compare_different((1000, 20), 'S', False, '<')
-        76.0±3μs         34.0±1μs     0.45  bench_strings.StringComparisons.time_compare_different((1000, 20), 'S', False, '!=')
-        78.6±1μs         33.9±1μs     0.43  bench_strings.StringComparisons.time_compare_identical((1000, 20), 'S', False, '!=')
-        78.8±3μs         33.9±1μs     0.43  bench_strings.StringComparisons.time_compare_different((1000, 20), 'S', False, '>')
-      60.4±0.2μs       25.7±0.1μs     0.43  bench_strings.StringComparisons.time_compare_different((1000, 20), 'U', False, '>=')
-        76.1±2μs         31.3±1μs     0.41  bench_strings.StringComparisons.time_compare_different((1000, 20), 'S', False, '==')
-      62.5±0.4μs       25.7±0.1μs     0.41  bench_strings.StringComparisons.time_compare_different((1000, 20), 'U', False, '<=')
-      63.0±0.3μs       25.7±0.1μs     0.41  bench_strings.StringComparisons.time_compare_different((1000, 20), 'U', False, '>')
-      62.5±0.7μs      25.1±0.03μs     0.40  bench_strings.StringComparisons.time_compare_different((1000, 20), 'U', False, '==')
-        64.6±2μs      25.6±0.07μs     0.40  bench_strings.StringComparisons.time_compare_different((1000, 20), 'U', False, '<')
-        64.8±2μs       25.5±0.2μs     0.39  bench_strings.StringComparisons.time_compare_different((1000, 20), 'U', False, '!=')
-      48.7±0.3μs       18.8±0.4μs     0.39  bench_strings.StringComparisons.time_compare_different(10000, 'U', True, '<=')
-      48.9±0.2μs       18.6±0.2μs     0.38  bench_strings.StringComparisons.time_compare_different(10000, 'U', True, '<')
-      48.6±0.2μs       18.4±0.1μs     0.38  bench_strings.StringComparisons.time_compare_different(10000, 'U', True, '!=')
-      49.0±0.6μs       18.5±0.2μs     0.38  bench_strings.StringComparisons.time_compare_different(10000, 'U', True, '==')
-      48.9±0.5μs      18.4±0.09μs     0.38  bench_strings.StringComparisons.time_compare_different(10000, 'U', True, '>')
-      49.4±0.6μs       18.5±0.2μs     0.38  bench_strings.StringComparisons.time_compare_different(10000, 'U', True, '>=')
-      97.0±0.3μs       36.2±0.2μs     0.37  bench_strings.StringComparisons.time_compare_different((1000, 20), 'U', True, '!=')
-      96.9±0.7μs       35.9±0.2μs     0.37  bench_strings.StringComparisons.time_compare_different((1000, 20), 'U', True, '<')
-      98.0±0.4μs       36.2±0.8μs     0.37  bench_strings.StringComparisons.time_compare_different((1000, 20), 'U', True, '<=')
-      97.9±0.4μs       36.1±0.4μs     0.37  bench_strings.StringComparisons.time_compare_different((1000, 20), 'U', True, '>=')
-      97.9±0.3μs       35.7±0.7μs     0.36  bench_strings.StringComparisons.time_compare_different((1000, 20), 'U', True, '>')
-      98.1±0.8μs       35.6±0.3μs     0.36  bench_strings.StringComparisons.time_compare_different((1000, 20), 'U', True, '==')

mhvk · 2022-02-13T21:45:02Z

@seberg - those are very impressive improvements!! I'm rather puzzled by this, though:

+         104±1μs        118±0.4μs     1.14  bench_strings.StringComparisons.time_compare_identical(10000, 'U', True, '!=')
-         104±1μs       77.7±0.6μs     0.75  bench_strings.StringComparisons.time_compare_identical(10000, 'U', True, '==')

How is it possible that '!=' is 50% slower than '=='?? I would have expected the two to be identical (as was the case in the old code). And indeed, all the other comparisons are the same (within uncertainties) as '=='.

The same 50% penalty holds when not using rstrip, so that is not the problem:

+      54.6±0.9μs      60.3±0.09μs     1.10  bench_strings.StringComparisons.time_compare_identical(10000, 'U', False, '!=')
-      52.9±0.5μs       40.1±0.5μs     0.76  bench_strings.StringComparisons.time_compare_identical(10000, 'U', False, '==')

As well as for the 2-d case.

seberg · 2022-02-13T21:48:52Z

Hmmm, no idea, without thinking about it, I would have assumed that all comparisons will have the same speed. Is there some reason the compiler can optimize some path more than the other, or just random?

mhvk · 2022-02-13T21:59:55Z

The timing differences are very systematic: all comparisons are the same except '!=' for 'U'. I cannot see any reason for it in your code; after all, the base function doing the comparisons does not even know what comparison is being done; it just returns -1, 0, or +1. Maybe it is some kind of weird fluke?

mhvk · 2022-02-13T22:00:24Z

I.e., maybe worth running the benchmarks again?

seberg · 2022-02-13T22:32:51Z

Its reproducible on my comp, also if I change the order in which the benchmarks run. The compiler must be spitting something different for !=.

mhvk · 2022-02-13T22:46:07Z

Weird, I just thought I'd try to reproduce it too, and I cannot:

In [1]: import numpy as np

In [2]: a = np.arange(100000.0).astype("U")

In [3]: b = a.copy()

In [4]: %timeit a != b
2.23 ms ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [5]: %timeit a == b
2.29 ms ± 11.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

(For both, current master gives 3.46 ms, so I confirm the great speed-up! I also tried with np.arange(10000), just to be closer to the benchmark.)

seberg · 2022-02-13T22:54:58Z

To be honest, I currently don't care much about this oddness related to the core of the loop right now. Could be compiler version or even hardware differences (gcc 11.2 and a Ryzen 3 here) 🤷.
But if it is specific to my hardware, maybe that adds a reason to not worry about it?

If anyone to squeeze performance, I would suggest implementing the get_loop and looking into specializing it for short/long strings (and even contiguous+short or so), I also didn't even try adding NPY_GCC_OPT3. Is that worthwhile for string arrays?

mhvk · 2022-02-13T23:18:34Z

@seberg - agreed that in the larger scheme of things this just doesn't matter. The code is much nicer and faster, what else could one ask for!?

seberg · 2022-02-14T19:19:22Z

OK, I think I covered all the important paths, so this should be ready now for another review and generally ready modulo smaller header or other cleanups.
The only tricky thing would be if anyone is worried about that FutureWarning change aligning strings with other dtypes in comparisons.

(We also really need to deprecate unicode + string -> unicode promotion at some point, but that is unrelated.)

mhvk

Looks good! Mostly just a reiteration of @serge-sans-paille's suggestion to create an 'string_ufuncs.h', plus a small nit on one of the tests.

Though I'd prefer a more C++ adept person to sign off on that part!

mhvk · 2022-02-14T20:14:39Z

numpy/core/src/umath/string_ufuncs.cpp

+}
+
+
+extern "C" {


Like below: I think it makes sense to create a 'strings_ufunc.h' that contains these definitions.

Maybe @mattip, you have a good idea? I had that first, and it should work for the init_string_ufuncs, but the umath/multiarray boundary is always creating some headaches: it seems hard to include that header from multiarray somehow?

I admit, it is very likely I am just doing something stupid.

Also @serge-sans-paille, I thought another problem was that the symbol export didn't actually work right when I had it as its own header here. But I guess that may have been just a bad try?

Maybe good to separate what should more obviously work, which is to put the ones double-defined here in 'strings_ufunc.h' and import that new .h above. And only then worry about the other places.

Regarding multiarray/umath, I thought we had tried to harmonize those, but not sure...

Yes, they are now one build-unit, which is why this works at all without much hassle. But including umath stuff in multiarray is still tricky (I am not exactly sure why)

OK, moved things back, it is working fine, but did not try to cross the multiarray boundary yet. I guess one thing could be that I did not actually include other stuff in the header right now (not sure if I should).

numpy/core/src/umath/umathmodule.c

numpy/core/tests/test_strings.py

serge-sans-paille · 2022-02-14T23:40:34Z

numpy/core/src/umath/string_ufuncs.cpp

+template <typename character>
+static NPY_INLINE int
+character_cmp(character a, character b)
+{


yeah, maybe something other than an in is better as typename std::intermediate_type<npy_ucs4, npy_byte>::type

serge-sans-paille · 2022-02-14T23:44:24Z

numpy/core/src/umath/string_ufuncs.cpp

+                res = cmp >= 0;
+                break;
+            default:
+                assert(false);


the right static_assert here would be (in the function prelude`

static_assert(Py_EQ == comp || Py_NE == comp || ..., "valid comparison");

or something based on a range if you know for sure which enumerator is the lowest one.

numpy/core/src/umath/string_ufuncs.cpp

serge-sans-paille · 2022-02-14T23:48:08Z

numpy/core/src/umath/string_ufuncs.cpp

+        case Py_GE:
+            return string_comparison_loop<rstrip, Py_GE, character>;
+        default:
+            assert(false);


ok, seeing the same assert again, what you probably should do is

static constexpr is_valid_comp(int comp) { return comp == Py_LT || ...; }

and use it either in fonction prelude, either as an assert or a static_assert depending on the context

Or even better: make the int an enum class but that's probably not possible, just mentioning it.

OK, I didn't love the idea of a long duplicate Py_EQ || ... where the asserts were, but if it goes into the function prelude, I can see the point, will adopt :).

clang static analysis seems to not be quite smart enough to realize that the switch will always match, but I guess I will ignore that (it warns that res may be undefined in the other case).

I guess I could use default for the last one, but dunno.

Oh, silly me... This assert is actually not static at all.

Sorry for the verboseness. Pushed a change with that the enum class instead, figured that the static assert was just as verbose anyway...

serge-sans-paille · 2022-02-15T07:39:30Z

@seberg : feel free to cherry-pick serge-sans-paille@0bf1a47

seberg · 2022-02-15T18:42:47Z

Done that cherry pick, but also an STY commit because some style stuff I just can't agree with ;). Left some long lines, wasn't sure how to break them, and I think they are somewhat fine (i.e. nothing non-obvious hidden), although not great.

I am happy with this, it is nice that it removes the duplication, it adds a bit of "C++" knowledge requirement to truly understand what is going on, but I guess that is OK.

dlee992 · 2022-06-10T03:02:07Z

@seberg, @charris, just "a gentle hi". Recently I want to contribute a PR about string ufunc in Numba (numba/numba#8104), which is empty PR right now. I discussed with Numba members, and we hope to keep consistent with NumPy handling way. And I saw this PR! Very impressive and we need it (especially SS->?, UU->?, and unify string with other types in ufunc loop)!
I wonder whether this PR can be merged into NumPy 1.23.0 version ASAP, since I am pending on this very much. Haha. Thanks!

charris · 2022-06-10T13:52:21Z

Yeah, let's give this a shot. Thanks Sebastian. I'm going to do an rc3 this weekend.

Looks like this needs a release note. @seberg Since I will backport this for @dlee992, could you add the release note directly against the 1.23.x branch after this is backported?

* ENH: Implement string comparison ufuncs (or almost) This makes all comparison operators and ufuncs work on strings using the ufunc machinery. It requires a half-manual "ufunc" to keep supporting void comparisons and especially `np.compare_chararrays` (that one may have a bit more overhead now). In general the new code should be much faster, and has a lot of easier optimization potential. It is also much simpler since it can outsource some complexities to the ufunc/iterator machinery. This further fixes a couple of bugs with byte-swapped strings. The backward compatibility related change is that using the normal ufunc machinery means that string comparisons between string and unicode now give a `FutureWarning` (instead of just False). * MAINT: Do not use C99 tagged struct init in C++ C++ does not like it (at least not before C++20)... GCC and clang don't seem to mind, but MSVC seems to. * BENCH: Add basic string comparison benchmarks * DOC,STY: Fixup string-comparisons comments based on review Thanks to Marten's comments, a few clarfications and slight fixups. * ENH: Use `memcmp` because it may be faster for the byte case * TST: Improve string and unicode comparison tests. * MAINT: Use switch statement based on review As suggested be Serge. Co-authored-by: Serge Guelton <serge.guelton@telecom-bretagne.eu> * TST: Make unicode byte-swap test slightly more concrete The issue is that the `view` needs to use native byte-order, so just ensure native byte-order for the view, and then do another cast to get it right. * BUG: Add `np.compare_chararrays` to test and fix typo * TST: Add test for empty string comparisons * TST: Fixup string test based on martens review * MAINT: Move definitions back into string_ufuncs.h * MAINT: Use enum class for comparison operator templating This removes the need for a dynamic (or static) assert in the switch statement. * Template version of add_loop to avoid redundant code * STY: Fixup style, two spaces, error is -1 * STY: Small `string_ufuncs.cpp` fixups based on Serge's review * MAINT: Fix merge conflict (ensure_dtype_nbo was removed) Co-authored-by: Serge Guelton <serge.guelton@telecom-bretagne.eu>

charris · 2022-06-10T13:56:49Z

The backport is in #21716.

seberg · 2022-06-10T13:56:55Z

Oh we are giving a backport a shot? Will there be another (brief) rc? I don't really expect this to create issues, but it might have some very subtle changes.

I don't mind backporting, I do wonder though. @dlee992 are you interested in NumPy supporting np.equal(), etc. on the strings. Or do you need the API to extend NumPy ufuncs yourself in that way. Because the API here is only public experimentally right now.

charris · 2022-06-10T14:00:12Z

Will there be another (brief) rc?

Yes, there are already > 10 backports since rc2. And then I just want to sit on it for two weeks or so and stop doing that :) The zero release is the true NumPy rc, since that is when widespread testing really happens.

charris · 2022-06-10T14:01:48Z

If @dlee992 decides that a backport will not serve, the we can close the open PR for that.

dlee992 · 2022-06-10T14:59:50Z

I don't mind backporting, I do wonder though. @dlee992 are you interested in NumPy supporting np.equal(), etc. on the strings. Or do you need the API to extend NumPy ufuncs yourself in that way. Because the API here is only public experimentally right now.

@seberg, hi. I think I just need NumPy supporting np.equal(), etc. on the strings, not to extend other NumPy ufuncs by myself. BTW, I'd guess there will be other PRs to add SS->?/UU->? into np.equal.types etc? Since numba also relies on np.equal.types to implement its ufunc capabilities (maybe because it was implemented by Pitrou several years ago, which I am not sure).

If @dlee992 decides that a backport will not serve, the we can close the open PR for that.

@charris, hi, I need the very backport. I am glad it can be backported to 1.23.0.

Thanks for ur quick response very much!

seberg · 2022-06-10T15:30:10Z

@dlee992 np.equal.types is unmodified right now, I had not planned on finding a solution to that at this time. I suppose we could try (there may not be a complete solution, so I expect we may need to add new Python/C-API for numba in any case if you need this).

We should be able to expand the output of np.equal.types (but not the binary version stored on the ufunc struct) in some of the clear cases (such as np.equal.types).

Note that np.equal.types lists old-style loops, but these are new-style loops (they have a different signature).

dlee992 · 2022-06-13T09:21:03Z

@dlee992 np.equal.types is unmodified right now, I had not planned on finding a solution to that at this time. I suppose we could try (there may not be a complete solution, so I expect we may need to add new Python/C-API for numba in any case if you need this).

@seberg, maybe I can try to follow these parts of NumPy code, and try to extend np.equal.types for NumPy, if u don't have time to do this. But I am a newbie for NumPy, I will give it a try. Any suggestion is helpful to me!

We should be able to expand the output of np.equal.types (but not the binary version stored on the ufunc struct) in some of the clear cases (such as np.equal.types).

I'd guess these related stuff are implemented in string_ufuncs.cpp and arrayobject.c? IIRC, numba only uses np.equal.types as a string list to select an appropriate signature, and nothing more (not quite sure), it will generate compiled version by itself (some discussion is here numba/numba#7859).

seberg · 2022-06-13T14:44:50Z

@seberg, maybe I can try to follow these parts of NumPy code, and try to extend np.equal.types for NumPy

Of course, the problem is timing. If we really need this in the next release we have to do it super-fast and convince Chuck that backporting matters. It is probably not hard, but it seems a bit steep to me.
There may also be some subtleties, although for you it just matters to have the strings there (i.e. not a full solution, but something good enough for you).

I'd guess these related stuff are implemented in string_ufuncs.cpp and arrayobject.c? IIRC, numba only uses np.equal.types as a string list to select an appropriate signature, and nothing more (not quite sure)

This happens in umath/ufunc_object. there is a getter for the types. The problem is that the string loops have a different ABI then the "legacy" loops. And the ufunc.types are a summary of the "legacy" loops and do not contain the new ABI loops.
Even worse, a list of strings is probably not sufficient for a future-proof exposing in any case (i.e. it may work for strings, but not generally).

A "workaround" might be to add a "legacy" loop that is never used and just does nothing/sets an error if used. But that is pretty annoying.
The other workaround is to just manually extend the Python ufunc.types to include some of the new loops (if possible to describe them using simple strings).

dlee992 · 2022-06-14T07:29:19Z

@seberg, many thanks! But in fact, I cannot follow ur thought about extending ufunc.types right now, I have read Numba doc several times and opened several PRs for Numba, but I just began to read NumPy doc e.g., Numpy C-API doc, and build NumPy from source. I have to get a big picture about various concepts used in NumPy more carefully, then I guess I can follow this special string ufunc stuff and your workaround thought. I will spend several days in these parts and be back after this process.

PS, I think I am not that hurry about extending ufunc.types in NumPy 1.23.0 (maybe 1.23.1 is also OK for me), we can do it better with more thoughtful discussion. BTW, I saw the DDL is June 15 for 1.23.0 milestone, it is precise or will be delayed for two weeks?

Thanks again!

seberg · 2022-06-14T12:58:33Z

If there is consensus that returning two extra entries in np.equal.types we could do that later maybe, the question is whether we should worry about any backwards compatibility issue (say because someone auto-generates things from it, similar to numba).
In that case, I am not sure if it is a "bugfix" to land in 1.23.1 as opposed to 1.24 (we have a half-year release cadence).

We are now in the third rc for 1.23, so I would not expect many more changes unless something comes up. In the end it is up to Chuck though.

The following is what returns ufunc.types to Python:

numpy/numpy/core/src/umath/ufunc_object.c

Lines 6336 to 6368 in 703a6fa

    
           static PyObject * 
        
           ufunc_get_types(PyUFuncObject *ufunc, void *NPY_UNUSED(ignored)) 
        
           { 
        
               /* return a list with types grouped input->output */ 
        
               PyObject *list; 
        
               PyObject *str; 
        
               int k, j, n, nt = ufunc->ntypes; 
        
               int ni = ufunc->nin; 
        
               int no = ufunc->nout; 
        
               char *t; 
        
               list = PyList_New(nt); 
        
               if (list == NULL) { 
        
                   return NULL; 
        
               } 
        
               t = PyArray_malloc(no+ni+2); 
        
               n = 0; 
        
               for (k = 0; k < nt; k++) { 
        
                   for (j = 0; j<ni; j++) { 
        
                       t[j] = _typecharfromnum(ufunc->types[n]); 
        
                       n++; 
        
                   } 
        
                   t[ni] = '-'; 
        
                   t[ni+1] = '>'; 
        
                   for (j = 0; j < no; j++) { 
        
                       t[ni + 2 + j] = _typecharfromnum(ufunc->types[n]); 
        
                       n++; 
        
                   } 
        
                   str = PyUnicode_FromStringAndSize(t, no + ni + 2); 
        
                   PyList_SET_ITEM(list, k, str); 
        
               } 
        
               PyArray_free(t); 
        
               return list; 
        
           }

As you can see it reads ufunc->types. But NumPy now also has another list of "loops", which is stored as an actual list in ufunc->_loops. Now, we could scan/use ufunc->_loops rather than ufunc->types. The problems are:

The ufunc->_loops can have additional entries we need to skip.
In principle at least, ufunc->_loops could have entries that are not representable by single characters (breaking the formatting).
ufunc->_loops can get outdated for numba's DUFuncs, so you probably still need to have a check on ufunc->types (at least for changes).

seberg · 2022-06-16T15:20:19Z

@dlee992 for your awareness. It seems we should (and will) pull this out of NumPy 1.23. It leads to subtle changes for subclasses (subclasses were sometimes ignored before). That leads to subtle changes/breaks in MaskedArrays paths (that were arguably always a bit bad). And those breaks domino into failures for astropy...

dlee992 · 2022-06-17T02:35:42Z

@seberg, thanks! I will stay tuned about this ENH.

github-actions bot added the 01 - Enhancement label Feb 12, 2022

seberg commented Feb 12, 2022

View reviewed changes

seberg force-pushed the string-ufuncs branch 2 times, most recently from c549c66 to 26f0be6 Compare February 12, 2022 22:43

seberg commented Feb 12, 2022

View reviewed changes

numpy/core/src/multiarray/convert_datatype.h Show resolved Hide resolved

seberg commented Feb 12, 2022

View reviewed changes

seberg force-pushed the string-ufuncs branch from 26f0be6 to c7dbb38 Compare February 12, 2022 23:26

seberg force-pushed the string-ufuncs branch from fd098d9 to aa9a9f1 Compare February 13, 2022 17:16

mhvk reviewed Feb 13, 2022

View reviewed changes

numpy/core/src/umath/string_ufuncs.cpp Outdated Show resolved Hide resolved

numpy/core/src/umath/string_ufuncs.cpp Outdated Show resolved Hide resolved

numpy/core/src/umath/string_ufuncs.cpp Outdated Show resolved Hide resolved

numpy/core/src/umath/string_ufuncs.cpp Show resolved Hide resolved

serge-sans-paille reviewed Feb 13, 2022

View reviewed changes

seberg marked this pull request as ready for review February 13, 2022 20:53

seberg force-pushed the string-ufuncs branch from aa9a9f1 to 623cea5 Compare February 13, 2022 20:53

mhvk reviewed Feb 14, 2022

View reviewed changes

serge-sans-paille reviewed Feb 14, 2022

View reviewed changes

seberg force-pushed the string-ufuncs branch from 62cbfc3 to 7611b3d Compare February 15, 2022 00:17

seberg force-pushed the string-ufuncs branch from 1b729c0 to 11d91ac Compare February 15, 2022 18:48

seberg mentioned this pull request May 12, 2022

.max() and .min() don't work on arrays of type 'S' and 'U' (Trac #1316) #1914

Open

charris added the 09 - Backport-Candidate PRs tagged should be backported label Jun 10, 2022

charris modified the milestones: 1.23.1 release, 1.23.0 release Jun 10, 2022

charris merged commit 71f7f7c into numpy:main Jun 10, 2022

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Jun 10, 2022

charris removed this from the 1.23.0 release milestone Jun 10, 2022

seberg deleted the string-ufuncs branch June 10, 2022 14:08

seberg mentioned this pull request Jun 10, 2022

ENH: Implement string comparison ufuncs (or almost) #21716

Merged

seberg mentioned this pull request Jun 14, 2022

BUG: Casda failure with numpy 1.23.0rc3 astropy/astroquery#2440

Closed

pllim mentioned this pull request Jun 14, 2022

BUG: Table test failures with np 1.23.0rc3 astropy/astropy#13332

Closed

dlee992 mentioned this pull request Jun 15, 2022

Incorrect behavior with numpy array equality against a scalar unicode string numba/numba#7859

Open

mhvk mentioned this pull request Jun 15, 2022

BUG: MaskedArray with bytes dtype cannot be compared if fill_value is set #21770

Closed

bsipocz mentioned this pull request Jun 18, 2022

FIX: casda compatibility fix np1.23 astropy/astroquery#2441

Closed

		}


		extern "C" {

Uh oh!

ENH: Implement string comparison ufuncs (or almost) #21041

ENH: Implement string comparison ufuncs (or almost) #21041

Uh oh!

Conversation

seberg commented Feb 12, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg commented Feb 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattip commented Feb 13, 2022

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg Feb 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg Feb 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg commented Feb 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhvk commented Feb 13, 2022

Uh oh!

seberg commented Feb 13, 2022

Uh oh!

mhvk commented Feb 13, 2022

Uh oh!

mhvk commented Feb 13, 2022

Uh oh!

seberg commented Feb 13, 2022

seberg commented Feb 12, 2022 •

edited

Loading

seberg Feb 15, 2022 •

edited

Loading

seberg Feb 13, 2022 •

edited

Loading

seberg commented Feb 13, 2022 •

edited

Loading