Skip to content

ENH: improve runtime detection of CPU features #13421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 5, 2020

Conversation

seiko2plus
Copy link
Member

@seiko2plus seiko2plus commented Apr 29, 2019

This pull-request aims to improve the runtime detection of CPU features.

The cause:

The current CPU detection code has limited supports for X86 features and it's count on compiler built-in functions that not widely supported by other compilers or platforms.

The solution:

Implementing an independent API similar to GCC built-in functions, so instead of __builtin_cpu_init, __builtin_cpu_supports its provide npy_cpu_init, npy_cpu_have and NPY_CPU_HAVE.

For X86:

Detect almost all X86 features via instruction CPUID also check OS support for AVX/AVX512 , and provides CPU feature groups that gather several features. e.g. AVX512_KNM detect Knights Mill's AVX512 features.

For IBM/Power:

Only supports linux and count here on glibc(getauxval) to detect VSX support and fail-back to the compiler definitions for other platforms.

For ARM:

Same as IBM/Power but its parse /proc/self/auxv if glibc(getauxval) isn't available.

NOTES:

  • npy_cpu_supports is removed rather than deprecated, use the macro NPY_CPU_HAVE(FEATURE_NAME_WITHOUT_QUOTES) instead.

  • New attribute__cpu_features__added to umath module, its a dictionary contains all supported CPU feature names with runtime availability.

@charris charris changed the title core: improve runtime detection of CPU features MAINT: improve runtime detection of CPU features Apr 29, 2019
@charris charris changed the title MAINT: improve runtime detection of CPU features ENH: improve runtime detection of CPU features Apr 29, 2019
@charris
Copy link
Member

charris commented Apr 29, 2019

@rgommers Just a heads up.

@charris
Copy link
Member

charris commented Apr 29, 2019

@seiko2plus I assume that you will be reusing some code from cupy, which is OK because the cupy MIT license is compatible with the BSD license we use. Just want to make sure you are aware of potential license issues down the line in case it comes up.

@seiko2plus
Copy link
Member Author

@charris,

I assume that you will be reusing some code from cupy, which is OK because the cupy MIT license is compatible with the BSD license we use.

There no intention to reuse any code from Cupy or at least for now maybe I should take a look there , however this patch inspired by OpenCV.

Just want to make sure you are aware of potential license issues down the line in case it comes up.

sure, I do.

@seiko2plus seiko2plus force-pushed the core_improve_infa_runtime branch from 750cb16 to 7cd84c2 Compare April 30, 2019 01:07
@seiko2plus seiko2plus marked this pull request as ready for review April 30, 2019 01:18
@seiko2plus seiko2plus force-pushed the core_improve_infa_runtime branch 2 times, most recently from 6f5657b to 8ee4322 Compare April 30, 2019 16:03
@seiko2plus seiko2plus force-pushed the core_improve_infa_runtime branch from 8ee4322 to 2638510 Compare May 10, 2019 08:46
@seiko2plus seiko2plus force-pushed the core_improve_infa_runtime branch from 2638510 to 05e97b5 Compare August 12, 2019 05:57
@seiko2plus seiko2plus force-pushed the core_improve_infa_runtime branch 2 times, most recently from 20f7cea to 4621fcf Compare August 22, 2019 12:11
@rgommers
Copy link
Member

This PR conflicts with gh-13516 pretty badly, I can't combine the two. This PR seems useful in its own right, independent of what we end up doing in gh-13516. @seiko2plus could you comment on the interaction between the two? Why doesn't gh-13516 rely on this PR for example? Should we merge this first and then rebase gh-13516 on it?

@seiko2plus seiko2plus force-pushed the core_improve_infa_runtime branch 4 times, most recently from a4a4920 to e0e4ff8 Compare December 2, 2019 20:54
@seiko2plus seiko2plus force-pushed the core_improve_infa_runtime branch 6 times, most recently from 116a0e6 to 39b1e22 Compare February 3, 2020 22:47
seiko2plus and others added 2 commits February 5, 2020 05:09
  - Put the old CPU detection code to rest

    The current CPU detection code only supports x86 and
    it's count on compiler built-in functions that not widely supported
    by other compilers or platforms.

    NOTE: `npy_cpu_supports` is removed rather than deprecated,
     use the macro `NPY_CPU_HAVE(FEATURE_NAME_WITHOUT_QUOTES)` instead.

  - Initialize the new CPU features runtime detector

    Almost similar to GCC built-in functions,
    so instead of `__builtin_cpu_init`, `__builtin_cpu_supports`
    its provide `npy_cpu_init`, `npy_cpu_have` and `NPY_CPU_HAVE`.

    NOTE: `npy_cpu_init` must be called before any use of
    `npy_cpu_have` and `NPY_CPU_HAVE`, however `npy_cpu_init`
    already called during the load of module `umath`
    so there's no reason to call it again in most of the cases.

  - Add X86 support

    detect almost all x86 features, also provide
    CPU feature groups that gather several features.
    e.g. `AVX512_KNM` detect Knights Mill's `AVX512` features

  - Add IBM/Power support

    only supports Linux and count here on `glibc(getauxval)`
    to detect VSX support and fail-back to the compiler definitions
    for other platforms.

  - Add ARM support

    Same as IBM/Power but its parse `/proc/self/auxv`
    if `glibc(getauxval)` isn't available.

  - Update umath generator

  - Add testing unit for Linux only

  - Add new attribute `__cpu_features__` to umath module

    `__cpu_features__` is a dictionary contains all supported
    CPU feature names with runtime availability
@seiko2plus seiko2plus force-pushed the core_improve_infa_runtime branch from 39b1e22 to 64f7074 Compare February 5, 2020 04:50
@mattip mattip merged commit fed1fb4 into numpy:master Feb 5, 2020
@mattip
Copy link
Member

mattip commented Feb 5, 2020

Thanks @seiko2plus

@rgommers
Copy link
Member

rgommers commented Feb 5, 2020

Great to see this merged, thanks @seiko2plus!

@seiko2plus
Copy link
Member Author

wow finally thank you all :)

NPY_CPU_FEATURE_AVX512VBMI2 = 43,
NPY_CPU_FEATURE_AVX512BITALG = 44,

// X86 CPU Groups
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg, already explained here

@Qiyu8
Copy link
Member

Qiyu8 commented Feb 13, 2020

Thanks @seiko2plus .This PR is exactly what I need, as mentioned in scipy/scipy#11482, Now I'm ready to optimize Numpy in ARM-based Platform.

@seiko2plus
Copy link
Member Author

@Qiyu8, please wait, there's a misunderstanding here, this pr doesn't provide any infrastructure for ARM, only runtime detection, you have to wait for #13516, thank you.

@ryandesign
Copy link

Thank you for merging this, but the fix appears not to have been included in version 1.18.2 released last week.

@rgommers
Copy link
Member

@ryandesign it wasn't supposed to be included in 1.18.2. That was a bug fix only release, and this is a new enhancement. It will be in 1.19.0

@ryandesign
Copy link

I was afraid you were going to say something like that. I do not consider it an enhancement; I consider it a bug fix. Without it, we cannot build anything that uses numpy on macOS 10.12.

@seberg
Copy link
Member

seberg commented Mar 23, 2020

The issue is that it is a fairly large change, so it is hard to be sure that there is no regression for anyone, and regressions are especially bad in bug-fix releases... We want anyone to be able to update to a new 1.18 release without really thinking about it.

seberg added a commit to seberg/numpy that referenced this pull request Apr 30, 2020
charris added a commit that referenced this pull request May 1, 2020
DOC: Move misplaced news fragment for gh-13421
@mattip mattip added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jul 21, 2020
@seiko2plus seiko2plus deleted the core_improve_infa_runtime branch January 9, 2021 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement component: numpy._core component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants