ENH: Add SIMD versions of bool logical_&&,||,! and absolute #22167

Developer-Ecosystem-Engineering · 2022-08-23T19:45:27Z

NumPy has SIMD versions of BOOL logical_and, logical_or, logical_not, and absolute for SSE2. The changes here replace that implementation with one that uses universal intrinsics. This allows other architectures to have SIMD versions of the functions too.

BOOL logical_and and logical_or are particularly important for NumPy as that's how np.any() / np.all() are implemented.

Apple M1: up to 16.5x faster

       before           after         ratio
     [7c143834]       [c49d9dc2]
     <main>           <logical/dev>
-        3.47±0μs      1.82±0.01μs     0.52  bench_reduce.AnyAll.time_all_slow
-     4.05±0.01μs      1.83±0.06μs     0.45  bench_reduce.AnyAll.time_any_slow
-        6.55±0μs          532±2ns     0.08  bench_ufunc.Custom.time_not_bool
-        10.2±0μs          680±7ns     0.07  bench_ufunc.Custom.time_or_bool
-     11.0±0.07μs          665±3ns     0.06  bench_ufunc.Custom.time_and_bool
                                                                                                                                                                    
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

Apple M1 (Rosetta): up to 1.6x faster

       before           after         ratio
     [7c143834]       [1ad2f2f2]
     <main>           <logical/dev>
-     1.14±0.01μs         1.03±0μs     0.90  bench_ufunc.Custom.time_not_bool
-     4.38±0.03μs      3.06±0.02μs     0.70  bench_reduce.AnyAll.time_any_slow
-     5.20±0.01μs      3.17±0.01μs     0.61  bench_reduce.AnyAll.time_all_slow

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

iMac Pro (AVX512): up to 1.2x faster

       before           after         ratio
     [da6297b9]       [c49d9dc2]
     <main>           <logical/dev>
-     1.42±0.02μs      1.24±0.03μs     0.87  bench_ufunc.Custom.time_and_bool
-     4.16±0.03μs       3.60±0.1μs     0.86  bench_reduce.AnyAll.time_any_slow
-     1.21±0.03μs      1.03±0.02μs     0.86  bench_ufunc.Custom.time_not_bool
-      4.30±0.1μs      3.56±0.07μs     0.83  bench_reduce.AnyAll.time_all_slow

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

mattip

Nice. I will defer to @seiko2plus for the deeper SIMD review. A couple of general questions for all the latest SIMD PRs:

could you check the binary file size change of _multiarray_umath*.so after vs. before?
Did the change have an effect on any other benchmarks besides the obvious ones that you reported
If possible, could you run the benchmarks on a non-avx512 x86_64 system

numpy/core/setup.py.orig

seiko2plus

Sorry for the delayed response. For fair optimization and to avoid two separate implementations for reduction, new universal intrinsic any/all going to be implemented through pr #22306.

numpy/core/src/umath/loops_logical.dispatch.c.src

Developer-Ecosystem-Engineering · 2022-09-27T17:18:40Z

I merged in the suggestions made to adopt changes in #22306 (didn't exist at the time of this PR authorship)

…nd absolute NumPy has SIMD versions of BOOL `logical_and`, `logical_or`, `logical_not`, and `absolute` for SSE2. The changes here replace that implementation with one that uses their universal intrinsics. This allows other architectures to have SIMD versions of the functions too. BOOL `logical_and` and `logical_or` are particularly important for NumPy as that's how `np.any()` / `np.all()` are implemented.

Co-authored-by: Sayed Adel <seiko@imavr.com>

seiko2plus

Excellent work, just one nit more.

numpy/core/setup.py.orig

numpy/core/src/umath/loops_logical.dispatch.c.src

Developer-Ecosystem-Engineering · 2022-12-07T23:42:01Z

Requested changes implemented thanks!

seiko2plus

My apologies, It seems I wasn't entirely focused. One thing more.

numpy/core/src/umath/loops_logical.dispatch.c.src

seiko2plus

LGTM, Thank you.
I have made some additional changes to umath generator to enable runtime dispatching for the following alias:

BOOL_invert, BOOL_add, BOOL_bitwise_and
BOOL_bitwise_or, BOOL_logical_xor
BOOL_bitwise_xor, BOOL_multiply
BOOL_maximum, BOOL_minimum, BOOL_fmax,
BOOL_fmin

Following functions are defined by umath generator to enable runtime dispatching without the need to redefine them within dsipatch-able sources: BOOL_invert, BOOL_add, BOOL_bitwise_and BOOL_bitwise_or, BOOL_logical_xor BOOL_bitwise_xor, BOOL_multiply BOOL_maximum, BOOL_minimum, BOOL_fmax, BOOL_fmin

seiko2plus · 2022-12-08T17:34:05Z

The last push after the approval to satisfy python linter and to fix dispatch karg on umath generator for logical_xor, both errors were caused by my latest patch.

Developer-Ecosystem-Engineering · 2022-12-14T17:27:54Z

@mattip Anything else missing/required?

mattip · 2022-12-15T03:05:19Z

Thanks @Developer-Ecosystem-Engineering and @seiko2plus

mattip · 2022-12-15T03:07:26Z

@Developer-Ecosystem-Engineering if the benchmarks at the top of the PR are not the latest ones, could you post updated ones here as a footnote to this PR?

tylerjereddy · 2022-12-21T00:12:57Z

Note that this showed up as a bit naughty in git bisect from gh-22845.

github-actions bot added the 01 - Enhancement label Aug 23, 2022

rgommers added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Aug 24, 2022

mattip reviewed Aug 24, 2022

View reviewed changes

numpy/core/setup.py.orig Outdated Show resolved Hide resolved

seiko2plus requested changes Sep 17, 2022

View reviewed changes

Developer-Ecosystem-Engineering and others added 6 commits December 6, 2022 17:22

Update numpy/core/src/umath/loops_logical.dispatch.c.src

dab5229

Co-authored-by: Sayed Adel <seiko@imavr.com>

Update numpy/core/src/umath/loops_logical.dispatch.c.src

94d4eae

Co-authored-by: Sayed Adel <seiko@imavr.com>

Update numpy/core/src/umath/loops_logical.dispatch.c.src

f144c89

Co-authored-by: Sayed Adel <seiko@imavr.com>

Add loops_logical.dispatch.c.src to meson.build

da72745

Finish adopting universal intrinsics to fix AVX2/AVX512 tests.

b067a42

Developer-Ecosystem-Engineering force-pushed the add_simd_bool_logical_andornot_absolute branch from d1ff56b to b067a42 Compare December 7, 2022 07:01

Fix smoke test running without NPY_SIMD

ae849e5

seiko2plus mentioned this pull request Dec 7, 2022

ENH: Implement SIMD versions of isnan,isinf, isfinite and signbit #22165

Merged

seiko2plus requested changes Dec 7, 2022

View reviewed changes

numpy/core/setup.py.orig Outdated Show resolved Hide resolved

numpy/core/src/umath/loops_logical.dispatch.c.src Outdated Show resolved Hide resolved

numpy/core/src/umath/loops_logical.dispatch.c.src Outdated Show resolved Hide resolved

Remote setup.py.orig and use npyv_any/all

2de8865

seiko2plus requested changes Dec 8, 2022

View reviewed changes

numpy/core/src/umath/loops_logical.dispatch.c.src Show resolved Hide resolved

numpy/core/src/umath/loops_logical.dispatch.c.src Outdated Show resolved Hide resolved

Use mask_to_true() for logical_not to save a few ops

0609a34

seiko2plus self-assigned this Dec 8, 2022

seiko2plus force-pushed the add_simd_bool_logical_andornot_absolute branch 3 times, most recently from 7bb525f to 468a3da Compare December 8, 2022 17:01

seiko2plus approved these changes Dec 8, 2022

View reviewed changes

seiko2plus force-pushed the add_simd_bool_logical_andornot_absolute branch from 468a3da to c47e5ff Compare December 8, 2022 17:27

seiko2plus force-pushed the add_simd_bool_logical_andornot_absolute branch from c47e5ff to bfa444d Compare December 8, 2022 17:31

mattip merged commit 78a499d into numpy:main Dec 15, 2022

This was referenced Dec 20, 2022

REGR: Comparing boolean array to True returns incorrect uint8 value with more than 32 values on 1.24.0 #22840

Closed

np.logical_xor.accumulate fails on 1.24 on mac #22841

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add SIMD versions of bool logical_&&,||,! and absolute #22167

ENH: Add SIMD versions of bool logical_&&,||,! and absolute #22167

Developer-Ecosystem-Engineering commented Aug 23, 2022 •

edited

Loading

mattip left a comment

seiko2plus left a comment

Developer-Ecosystem-Engineering commented Sep 27, 2022

seiko2plus left a comment

Developer-Ecosystem-Engineering commented Dec 7, 2022

seiko2plus left a comment

seiko2plus left a comment

seiko2plus commented Dec 8, 2022

Developer-Ecosystem-Engineering commented Dec 14, 2022

mattip commented Dec 15, 2022

mattip commented Dec 15, 2022

tylerjereddy commented Dec 21, 2022

ENH: Add SIMD versions of bool logical_&&,||,! and absolute #22167

ENH: Add SIMD versions of bool logical_&&,||,! and absolute #22167

Conversation

Developer-Ecosystem-Engineering commented Aug 23, 2022 • edited Loading

mattip left a comment

Choose a reason for hiding this comment

seiko2plus left a comment

Choose a reason for hiding this comment

Developer-Ecosystem-Engineering commented Sep 27, 2022

seiko2plus left a comment

Choose a reason for hiding this comment

Developer-Ecosystem-Engineering commented Dec 7, 2022

seiko2plus left a comment

Choose a reason for hiding this comment

seiko2plus left a comment

Choose a reason for hiding this comment

seiko2plus commented Dec 8, 2022

Developer-Ecosystem-Engineering commented Dec 14, 2022

mattip commented Dec 15, 2022

mattip commented Dec 15, 2022

tylerjereddy commented Dec 21, 2022

Developer-Ecosystem-Engineering commented Aug 23, 2022 •

edited

Loading