-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: improve runtime detection of CPU features #13421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@rgommers Just a heads up. |
@seiko2plus I assume that you will be reusing some code from cupy, which is OK because the cupy MIT license is compatible with the BSD license we use. Just want to make sure you are aware of potential license issues down the line in case it comes up. |
There no intention to reuse any code from Cupy or at least for now maybe I should take a look there , however this patch inspired by OpenCV.
sure, I do. |
750cb16
to
7cd84c2
Compare
6f5657b
to
8ee4322
Compare
8ee4322
to
2638510
Compare
2638510
to
05e97b5
Compare
20f7cea
to
4621fcf
Compare
This PR conflicts with gh-13516 pretty badly, I can't combine the two. This PR seems useful in its own right, independent of what we end up doing in gh-13516. @seiko2plus could you comment on the interaction between the two? Why doesn't gh-13516 rely on this PR for example? Should we merge this first and then rebase gh-13516 on it? |
a4a4920
to
e0e4ff8
Compare
116a0e6
to
39b1e22
Compare
- Put the old CPU detection code to rest The current CPU detection code only supports x86 and it's count on compiler built-in functions that not widely supported by other compilers or platforms. NOTE: `npy_cpu_supports` is removed rather than deprecated, use the macro `NPY_CPU_HAVE(FEATURE_NAME_WITHOUT_QUOTES)` instead. - Initialize the new CPU features runtime detector Almost similar to GCC built-in functions, so instead of `__builtin_cpu_init`, `__builtin_cpu_supports` its provide `npy_cpu_init`, `npy_cpu_have` and `NPY_CPU_HAVE`. NOTE: `npy_cpu_init` must be called before any use of `npy_cpu_have` and `NPY_CPU_HAVE`, however `npy_cpu_init` already called during the load of module `umath` so there's no reason to call it again in most of the cases. - Add X86 support detect almost all x86 features, also provide CPU feature groups that gather several features. e.g. `AVX512_KNM` detect Knights Mill's `AVX512` features - Add IBM/Power support only supports Linux and count here on `glibc(getauxval)` to detect VSX support and fail-back to the compiler definitions for other platforms. - Add ARM support Same as IBM/Power but its parse `/proc/self/auxv` if `glibc(getauxval)` isn't available. - Update umath generator - Add testing unit for Linux only - Add new attribute `__cpu_features__` to umath module `__cpu_features__` is a dictionary contains all supported CPU feature names with runtime availability
39b1e22
to
64f7074
Compare
Thanks @seiko2plus |
Great to see this merged, thanks @seiko2plus! |
wow finally thank you all :) |
NPY_CPU_FEATURE_AVX512VBMI2 = 43, | ||
NPY_CPU_FEATURE_AVX512BITALG = 44, | ||
|
||
// X86 CPU Groups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg, already explained here
Thanks @seiko2plus .This PR is exactly what I need, as mentioned in scipy/scipy#11482, Now I'm ready to optimize Numpy in ARM-based Platform. |
Thank you for merging this, but the fix appears not to have been included in version 1.18.2 released last week. |
@ryandesign it wasn't supposed to be included in 1.18.2. That was a bug fix only release, and this is a new enhancement. It will be in 1.19.0 |
I was afraid you were going to say something like that. I do not consider it an enhancement; I consider it a bug fix. Without it, we cannot build anything that uses numpy on macOS 10.12. |
The issue is that it is a fairly large change, so it is hard to be sure that there is no regression for anyone, and regressions are especially bad in bug-fix releases... We want anyone to be able to update to a new 1.18 release without really thinking about it. |
DOC: Move misplaced news fragment for gh-13421
This pull-request aims to improve the runtime detection of CPU features.
The solution:
Implementing an independent API similar to GCC built-in functions, so instead of
__builtin_cpu_init
,__builtin_cpu_supports
its providenpy_cpu_init
,npy_cpu_have
andNPY_CPU_HAVE
.For X86:
Detect almost all
X86
features via instructionCPUID
also check OS support forAVX
/AVX512
, and provides CPU feature groups that gather several features. e.g.AVX512_KNM
detect Knights Mill'sAVX512
features.For IBM/Power:
Only supports
linux
and count here onglibc(getauxval)
to detectVSX
support and fail-back to the compiler definitions for other platforms.For ARM:
Same as IBM/Power but its parse
/proc/self/auxv
ifglibc(getauxval)
isn't available.NOTES:
npy_cpu_supports
is removed rather than deprecated, use the macroNPY_CPU_HAVE(FEATURE_NAME_WITHOUT_QUOTES)
instead.New attribute
__cpu_features__
added to umath module, its a dictionary contains all supported CPU feature names with runtime availability.