-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH, SIMD: Replace libdivide functions of signed integer division with universal intrinsics #18766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Each dispatched cpu feature should have a corresponding benchmark. |
5e9589c
to
f96211e
Compare
close/reopen |
e2e6908
to
4dc28da
Compare
Whats the difference between numpy/numpy/core/src/umath/loops.h.src Line 612 in 99b396b
EDIT: My Bad, saw that one of them is |
4dc28da
to
7651c58
Compare
@seiko2plus what say we keep this PR only for signed types? I'll take up timedelta in a new PR as it's slightly different with a need for |
I think it's fine to handle the remained work in a another pr. |
@seiko2plus, @Qiyu8 still have open "changes requested", among them to re-run the relevant benchmarks. |
c056c42
to
f505827
Compare
f505827
to
b5193d7
Compare
- Revert unsigned integer division changes, overflow check only required by signed division. - Fix floor round for reduce divison - cleanup - revert fixme comment
b5193d7
to
4619081
Compare
When the divisor is equal to the minimum integer value, It was affected by gcc 9.3 only and under certain conditions of aggressive optimization.
e4f575d
to
f74f500
Compare
I have made several fixes/improvements explained in the latest commit log and have also deleted the overflow-test patch since it's already included by PR #19046. |
New benchmark results have added to the PR description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thank you Ganesh.
Thanks @ganesh-k13, @seiko2plus. The speedups are quite nice. |
Thanks a lot, @seiko2plus! |
Does anyone here have time to look into gh-20025? It seems like a pretty bad bug, and we should make sure to fix it before the 1.22 release at least. |
Dispatch for signed floor division
Continues work of #18178 and #18075.
TODO items:
- [ ] Modify timedelta code where libdivide is used,- [ ] Purge process of libdivide. Any license related changes needed(?)CC: @seiko2plus , @seberg
Benchmarks
X86
CPU
OS
Linux seiko-pc 5.8.0-48-generic #54-Ubuntu SMP Fri Mar 19 14:25:20 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux gcc (Ubuntu 10.2.0-13ubuntu1) 10.2.0
Benchmark
AVX2
SSE41
SSE3
Power little-endian
CPU
OS
Benchmark
VSX2
AArch64
CPU
OS
Linux localhost 4.14.113-seiko_fastboot #30 SMP PREEMPT Wed Dec 30 12:28:43 IST 2020 aarch64 aarch64 aarch64 GNU/Linux gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Benchmark
NEON
Initialed benchmark(before the latest modifications runned on different X86/HW than the above one)
Basic Benchmarks: