-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
ENH, SIMD: Optimize the hyperbolic implementation based on Highway wr… #29211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…apper Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review based on just the first glance. AFAIK: we need HWY_ATTR for all the functions that use highway ops. From highway docs: each function that calls Highway ops (such as Load) must either be prefixed with HWY_ATTR...
numpy/_core/meson.build
Outdated
[ | ||
AVX512_SKX, [AVX2, FMA3], | ||
VSX4, VSX2, | ||
VSX2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove VSX4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove VSX4?
Hi, @r-devulap
Before my modification, NPY_SIMD_F64 was used to control DOUBLE_tanh. For VSX4, the macro NPY_SIMD_F64 evaluates to 0, so DOUBLE_tanh was implemented using scalar code. Therefore, no compilation error occurred.
After my modification, I switched to using NPY_HWY_F64 to control DOUBLE_tanh. For VSX4, NPY_HWY_F64 evaluates to 1, so DOUBLE_tanh is implemented using vectorized code. However, when compiling on a Power9 system (with -mcpu=power9), calling hwy::N_PPC10 functions leads to a compilation error due to target-specific option mismatches.
Therefore, removing VSX4 from numpy/_core/meson.build results in behavior consistent with the original implementation before my change.
https://www.ibm.com/support/pages/node/7155156
CI use gcc11.4 now , however processor is Power9 used by CI.
HWY_ATTR NPY_FINLINE vtype | ||
load_vector(type_t* src, npy_intp ssrc, npy_intp len){ | ||
auto D = hn::DFromV<vtype>(); | ||
HWY_INLINE vtype load_vector(type_t* src, npy_intp ssrc, npy_intp len){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't it require HWY_ATTR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
HWY_ATTR NPY_FINLINE void | ||
store_vector(vtype vec, type_t* dst, npy_intp sdst, npy_intp len){ | ||
auto D = hn::DFromV<vtype>(); | ||
HWY_INLINE void store_vector(vtype vec, type_t* dst, npy_intp sdst, npy_intp len){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: don't we need HWY_ATTR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Test environment: Banana Pi BPI-F3 (SpacemiT K1 8-core RISC-V chip) gcc version 14.2.0
Test command: spin bench --compare HEAD~1 -t bench_ufunc.TanhSpecific
Add class TanhSpecific in benchmarks/benchmarks/bench_ufunc.py for this test on Banana Pi BPI-F3
(numpy_venv) root@yangwang:/home/yangwang/numpy# git log -2
commit bbe89a87b37f86d4f00e2a93882ef8d642a8db2c (HEAD -> main)
Author: Wang Yang yangwang@iscas.ac.cn
Date: Fri Aug 22 16:19:36 2025 +0800
ENH, SIMD: Optimize the hyperbolic implementation based on Highway wrapper
commit 4cbc80036001af445db0c21e71d0707f64014258
Author: Wang Yang yangwang@iscas.ac.cn
Date: Fri Aug 22 16:15:42 2025 +0800
TST:Add np.float and np.double for tanh test
(numpy_venv) root@yangwang:/home/yangwang/numpy#
(numpy_venv) root@yangwang:/home/yangwang/numpy# spin bench --compare HEAD~1 -t bench_ufunc.TanhSpecific