SIMD: Add partial/non-contig load and store intrinsics for 32/64-bit #17340

seiko2plus · 2020-09-17T10:28:23Z

This patch implements NPYV intrinsics for partial and non-contiguous memory access,
which paves the way to replace the raw SIMD kernels in simd.inc.src with the universal intrinsics.

required by #16247

numpy/core/src/common/simd/avx2/memory.h

mattip · 2020-10-07T08:43:48Z

This seems to build on PR gh-16782, correct?

seiko2plus · 2020-10-07T09:07:19Z

@mattip, yes this pull-request temporary merge #16782, so I can be able to test the new intrinsics.

seiko2plus · 2020-10-09T00:00:35Z

All tests are successfully passed, I will move testing units of the new initrinics to #16782 so we can merge this pr.
https://travis-ci.org/github/numpy/numpy/builds/733983312
https://github.com/numpy/numpy/pull/17340/checks?check_run_id=1226297258

numpy/core/src/common/simd/avx2/memory.h

numpy/core/src/common/simd/avx512/memory.h

…-bit This patch improves the implementation of memory load/store for VSX

seiko2plus · 2020-10-21T18:34:24Z

@mattip, These intrinsics already been used by #17587 and #16247 and proved a good efficiency almost similar to the replacement raw SIMD in case of AVX2 and AVX512F, provide massive improvements for non-contiguous memory access
in the case of SSE and VSX, on the other hand, NEON/ASIMD shows acceptable improvements but not that wow.

I hope we can merge this pull-request as soon as possible.

charris · 2020-10-25T16:58:52Z

@seiko2plus I notice that you are still making commits here. Do you feel that there is more to do?

mattip · 2020-10-25T17:09:39Z

I was hoping to merge #16782 first, thinking that then we might be able to add some (maybe marked @slow) tests using that infrastructure here. Does that make sense?

seiko2plus · 2020-10-25T17:42:51Z

@charris, no, the last change I made on this pr was 17 days ago,

 seiko2plus force-pushed the seiko2plus:npyv_partial_noncont_mem branch from bec733b to 1b8637d 17 days ago

the other messages due to build #16247 and #17587 on the top of this pr(reference commit).

@mattip,

I was hoping to merge #16782 first,

I totally agree with you without testing cases it would be chaos.

thinking that then we might be able to add some (maybe marked @slow) tests using that infrastructure here

there's no need for @slow #16782 is too fast in running time the current ratio 1 to 5 seconds depending on
the enabled SIMD extensions. The only issue is the binary size and maybe the building time.

charris · 2020-10-25T20:48:04Z

Thanks Sayed.

seiko2plus marked this pull request as draft September 17, 2020 10:29

seiko2plus force-pushed the npyv_partial_noncont_mem branch 11 times, most recently from ed975c8 to b699b95 Compare September 25, 2020 00:47

mattip added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Sep 25, 2020

Qiyu8 reviewed Sep 28, 2020

View reviewed changes

numpy/core/src/common/simd/avx2/memory.h Show resolved Hide resolved

seiko2plus force-pushed the npyv_partial_noncont_mem branch from b699b95 to b7761ba Compare October 7, 2020 08:32

github-actions bot added the 25 - WIP label Oct 7, 2020

seiko2plus force-pushed the npyv_partial_noncont_mem branch 4 times, most recently from 19fd9fd to e7e4699 Compare October 8, 2020 13:44

seiko2plus mentioned this pull request Oct 8, 2020

ENH:Umath Replace raw SIMD of unary float point(32-64) with NPYV - g0 #16247

Merged

11 tasks

seiko2plus force-pushed the npyv_partial_noncont_mem branch from 0744337 to 24b5841 Compare October 9, 2020 00:03

seiko2plus marked this pull request as ready for review October 9, 2020 00:03

seiko2plus changed the title ~~WIP:SIMD: Add partial/non-contig load and store intrinsics for 32/64-bit~~ SIMD: Add partial/non-contig load and store intrinsics for 32/64-bit Oct 9, 2020

seiko2plus commented Oct 9, 2020

View reviewed changes

numpy/core/src/common/simd/avx2/memory.h Outdated Show resolved Hide resolved

seiko2plus commented Oct 9, 2020

View reviewed changes

ENH, SIMD: Add partial/non-contig load and store intrinsics for 32/64…

1b8637d

…-bit This patch improves the implementation of memory load/store for VSX

seiko2plus force-pushed the npyv_partial_noncont_mem branch from bec733b to 1b8637d Compare October 9, 2020 00:18

seiko2plus mentioned this pull request Oct 9, 2020

ENH, TST: Bring the NumPy C SIMD vectorization interface "NPYV" to Python #16782

Merged

7 tasks

seiko2plus mentioned this pull request Oct 19, 2020

SIMD: Replace raw SIMD of sin/cos with NPYV(universal intrinsics) #17587

Merged

5 tasks

charris added 01 - Enhancement and removed 25 - WIP labels Oct 25, 2020

charris merged commit fcba5a6 into numpy:master Oct 25, 2020

Qiyu8 mentioned this pull request Nov 11, 2020

Optimize the performance of rot by using universal intrinsics OpenMathLib/OpenBLAS#2983

Merged

seiko2plus deleted the npyv_partial_noncont_mem branch January 9, 2021 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIMD: Add partial/non-contig load and store intrinsics for 32/64-bit #17340

SIMD: Add partial/non-contig load and store intrinsics for 32/64-bit #17340

seiko2plus commented Sep 17, 2020 •

edited

Loading

mattip commented Oct 7, 2020

seiko2plus commented Oct 7, 2020 •

edited

Loading

seiko2plus commented Oct 9, 2020

seiko2plus commented Oct 21, 2020

charris commented Oct 25, 2020

mattip commented Oct 25, 2020

seiko2plus commented Oct 25, 2020 •

edited

Loading

charris commented Oct 25, 2020

SIMD: Add partial/non-contig load and store intrinsics for 32/64-bit #17340

SIMD: Add partial/non-contig load and store intrinsics for 32/64-bit #17340

Conversation

seiko2plus commented Sep 17, 2020 • edited Loading

mattip commented Oct 7, 2020

seiko2plus commented Oct 7, 2020 • edited Loading

seiko2plus commented Oct 9, 2020

seiko2plus commented Oct 21, 2020

charris commented Oct 25, 2020

mattip commented Oct 25, 2020

seiko2plus commented Oct 25, 2020 • edited Loading

charris commented Oct 25, 2020

seiko2plus commented Sep 17, 2020 •

edited

Loading

seiko2plus commented Oct 7, 2020 •

edited

Loading

seiko2plus commented Oct 25, 2020 •

edited

Loading