-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
SIMD: Add partial/non-contig load and store intrinsics for 32/64-bit #17340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ed975c8
to
b699b95
Compare
b699b95
to
b7761ba
Compare
This seems to build on PR gh-16782, correct? |
19fd9fd
to
e7e4699
Compare
All tests are successfully passed, I will move testing units of the new initrinics to #16782 so we can merge this pr. |
0744337
to
24b5841
Compare
…-bit This patch improves the implementation of memory load/store for VSX
bec733b
to
1b8637d
Compare
@mattip, These intrinsics already been used by #17587 and #16247 and proved a good efficiency almost similar to the replacement raw SIMD in case of AVX2 and AVX512F, provide massive improvements for non-contiguous memory access I hope we can merge this pull-request as soon as possible. |
@seiko2plus I notice that you are still making commits here. Do you feel that there is more to do? |
I was hoping to merge #16782 first, thinking that then we might be able to add some (maybe marked |
@charris, no, the last change I made on this pr was 17 days ago,
the other messages due to build #16247 and #17587 on the top of this pr(reference commit).
I totally agree with you without testing cases it would be chaos.
there's no need for |
Thanks Sayed. |
This patch implements NPYV intrinsics for partial and non-contiguous memory access,
which paves the way to replace the raw SIMD kernels in
simd.inc.src
with the universal intrinsics.required by #16247