-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Add SIMD versions of bool logical_&&,||,! and absolute #22167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add SIMD versions of bool logical_&&,||,! and absolute #22167
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I will defer to @seiko2plus for the deeper SIMD review. A couple of general questions for all the latest SIMD PRs:
- could you check the binary file size change of
_multiarray_umath*.so
after vs. before? - Did the change have an effect on any other benchmarks besides the obvious ones that you reported
- If possible, could you run the benchmarks on a non-avx512 x86_64 system
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed response. For fair optimization and to avoid two separate implementations for reduction, new universal intrinsic any/all going to be implemented through pr #22306.
I merged in the suggestions made to adopt changes in #22306 (didn't exist at the time of this PR authorship) |
…nd absolute NumPy has SIMD versions of BOOL `logical_and`, `logical_or`, `logical_not`, and `absolute` for SSE2. The changes here replace that implementation with one that uses their universal intrinsics. This allows other architectures to have SIMD versions of the functions too. BOOL `logical_and` and `logical_or` are particularly important for NumPy as that's how `np.any()` / `np.all()` are implemented.
Co-authored-by: Sayed Adel <seiko@imavr.com>
Co-authored-by: Sayed Adel <seiko@imavr.com>
Co-authored-by: Sayed Adel <seiko@imavr.com>
d1ff56b
to
b067a42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work, just one nit more.
Requested changes implemented thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My apologies, It seems I wasn't entirely focused. One thing more.
7bb525f
to
468a3da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thank you.
I have made some additional changes to umath generator to enable runtime dispatching for the following alias:
BOOL_invert, BOOL_add, BOOL_bitwise_and
BOOL_bitwise_or, BOOL_logical_xor
BOOL_bitwise_xor, BOOL_multiply
BOOL_maximum, BOOL_minimum, BOOL_fmax,
BOOL_fmin
468a3da
to
c47e5ff
Compare
Following functions are defined by umath generator to enable runtime dispatching without the need to redefine them within dsipatch-able sources: BOOL_invert, BOOL_add, BOOL_bitwise_and BOOL_bitwise_or, BOOL_logical_xor BOOL_bitwise_xor, BOOL_multiply BOOL_maximum, BOOL_minimum, BOOL_fmax, BOOL_fmin
c47e5ff
to
bfa444d
Compare
The last push after the approval to satisfy python linter and to fix dispatch karg on umath generator for |
@mattip Anything else missing/required? |
Thanks @Developer-Ecosystem-Engineering and @seiko2plus |
@Developer-Ecosystem-Engineering if the benchmarks at the top of the PR are not the latest ones, could you post updated ones here as a footnote to this PR? |
Note that this showed up as a bit naughty in |
NumPy has SIMD versions of BOOL
logical_and
,logical_or
,logical_not
, andabsolute
for SSE2. The changes here replace that implementation with one that uses universal intrinsics. This allows other architectures to have SIMD versions of the functions too.BOOL
logical_and
andlogical_or
are particularly important for NumPy as that's hownp.any()
/np.all()
are implemented.Apple M1: up to 16.5x faster
Apple M1 (Rosetta): up to 1.6x faster
iMac Pro (AVX512): up to 1.2x faster