-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
ENH, SIMD: Optimize the hyperbolic implementation based on Highway wr… #29211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…apper Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review based on just the first glance. AFAIK: we need HWY_ATTR for all the functions that use highway ops. From highway docs: each function that calls Highway ops (such as Load) must either be prefixed with HWY_ATTR...
numpy/_core/meson.build
Outdated
[ | ||
AVX512_SKX, [AVX2, FMA3], | ||
VSX4, VSX2, | ||
VSX2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove VSX4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove VSX4?
Hi, @r-devulap
Before my modification, NPY_SIMD_F64 was used to control DOUBLE_tanh. For VSX4, the macro NPY_SIMD_F64 evaluates to 0, so DOUBLE_tanh was implemented using scalar code. Therefore, no compilation error occurred.
After my modification, I switched to using NPY_HWY_F64 to control DOUBLE_tanh. For VSX4, NPY_HWY_F64 evaluates to 1, so DOUBLE_tanh is implemented using vectorized code. However, when compiling on a Power9 system (with -mcpu=power9), calling hwy::N_PPC10 functions leads to a compilation error due to target-specific option mismatches.
Therefore, removing VSX4 from numpy/_core/meson.build results in behavior consistent with the original implementation before my change.
https://www.ibm.com/support/pages/node/7155156
CI use gcc11.4 now , however processor is Power9.
HWY_ATTR NPY_FINLINE vtype | ||
load_vector(type_t* src, npy_intp ssrc, npy_intp len){ | ||
auto D = hn::DFromV<vtype>(); | ||
HWY_INLINE vtype load_vector(type_t* src, npy_intp ssrc, npy_intp len){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't it require HWY_ATTR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
HWY_ATTR NPY_FINLINE void | ||
store_vector(vtype vec, type_t* dst, npy_intp sdst, npy_intp len){ | ||
auto D = hn::DFromV<vtype>(); | ||
HWY_INLINE void store_vector(vtype vec, type_t* dst, npy_intp sdst, npy_intp len){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: don't we need HWY_ATTR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
No description provided.