-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
ENH: Enable AVX2/AVX512 support to numpy #10251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@juliantaylor thoughts?
Also, can you expand on the logic behind generating multiple .so's for each
extension module? How does python know which one to load? (*Does* it even
load the fancier versions ever?) What's the approximate speedup, and what's
the approximate effect on download size? Does this also affect downstream
projects using numpy.distutils?
On Dec 20, 2017 1:11 PM, "Victor Rodriguez" <notifications@github.com> wrote:
This patch enables AVX2/AVX-512F instructions and distuils flags to
maximise the use of IA technology such as Haswell and Skylake platformns
on math functions of numpy
Compiled with :
python3 setup.py build -b py3 --fcompiler=gnu95
Examples of the results generated are:
mtrand.cpython-36m-x86_64-linux-gnu.so
mtrand.cpython-36m-x86_64-linux-gnu.so.avx2
mtrand.cpython-36m-x86_64-linux-gnu.so.avx512
Which has proper use of ZMMM / YMMM registers and FMA instructions
Signed-off-by: Arjan van de Ven arjan@linux.intel.com
Signed-off-by: William Douglas william.douglas@intel.com
Signed-off-by: Victor Rodriguez victor.rodriguez.bahena@intel.com
------------------------------
You can view, comment on, or merge this pull request online at:
#10251
Commit Summary
- ENH: Enable AVX2/AVX512 support to numpy
File Changes
- *M* numpy/core/src/umath/simd.inc.src
<https://github.com/numpy/numpy/pull/10251/files#diff-0> (195)
- *M* numpy/distutils/fcompiler/__init__.py
<https://github.com/numpy/numpy/pull/10251/files#diff-1> (35)
- *M* numpy/distutils/unixccompiler.py
<https://github.com/numpy/numpy/pull/10251/files#diff-2> (4)
Patch Links:
- https://github.com/numpy/numpy/pull/10251.patch
- https://github.com/numpy/numpy/pull/10251.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#10251>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAlOaNCpHhkgPYT_J_1KNfScQHAPLsSnks5tCXgMgaJpZM4RI93q>
.
|
It would be better to check AVX at runtime (checking it on import would be negligible) and then dynamically adjust the code path. |
There are two separate things in this PR (which is a bit confusing);
In terms of performance, for very basic vectorized math, AVX2 is a theoretical 2x increase over SSE, and AVX512 is another theoretical 2x. For code where multiply operations can be fused with add operations, AVX2 has a additional theoretical 2x to gain over SSE. theoretical gains are not realized gains, but for basic functions like in simd.inc.src, it's pretty typical to get 90% or more of theoretical (depends a bit on how big the arrays are) |
If this PR is restricted to (2), then IMO it's mergable. It would be a nice enhancement to add runtime checks though because the wheel package format doesn't allow specification of a processor architecture, meaning that most people probably won't run this code. |
@VictorRodriguez It looks like you got hit with MSVC 2008. Just wrap your code like this:
|
@xoviat thanks a lot for the help, after multiple experiments, it is passing all the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@VictorRodriguez Could you add an entry to the 1.15 release notes under "Improvements"? |
This patch enables AVX2/AVX-512F instructions and distuils flags to maximise the use of IA technology such as Haswell and Skylake platformns on math functions of numpy Signed-off-by: Arjan van de Ven arjan@linux.intel.com Signed-off-by: William Douglas william.douglas@intel.com Signed-off-by: Victor Rodriguez victor.rodriguez.bahena@intel.com
@charris I update the PR with a description under doc/release/1.15.0-notes.rst , is that ok ? |
Thanks @VictorRodriguez . |
on what type of cpu do you see 90% of the theoretical gains? |
also this is broken as the overlap checks have not been adapted for the larger vector sizes. |
@juliantaylor Is the overlap problem fixed by your recent PR? |
@juliantaylor thanks a lot for your feedback. Let the compiler do this thing is risky since standard compilers don't do in the specific way we are doing here. We are using intrinsic immintrin.h instructions to load and execute AVX instructions. Also, this patch is solving the gap that if you compile numpy for avx2 (or 512) today, say with -march=native, you get the SSE code for the simd functions even though the rest of the code gets AVX2. if the overlap checks have not been adapted I will be happy to fix them. Since users are always looking for better performance on numpy applications I will recommend to leave this as part of the next release and see feedback from users. It is not broken and works fine since Clear Linux use the same approach . Objdump -d will show , i will upload some numbers soon (https://github.com/clearlinux-pkgs/numpy) |
you are not doing anything special here this is the most trivial of vectorized code, compilers can do it for a very long time. At the very least gcc produces equivalent machine code. |
This reverts commit bcf949b.
This patch enables AVX2/AVX-512F instructions and distuils flags to
maximise the use of IA technology such as Haswell and Skylake platformns
on math functions of numpy
Compiled with :
python3 setup.py build -b py3 --fcompiler=gnu95
Examples of the results generated are:
Which has proper use of ZMMM / YMMM registers and FMA instructions
Signed-off-by: Arjan van de Ven arjan@linux.intel.com
Signed-off-by: William Douglas william.douglas@intel.com
Signed-off-by: Victor Rodriguez victor.rodriguez.bahena@intel.com