Skip to content

ENH, SIMD: Optimize the hyperbolic implementation based on Highway wr… #29211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ixgbe
Copy link
Contributor

@ixgbe ixgbe commented Jun 16, 2025

No description provided.

…apper

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
Copy link
Member

@r-devulap r-devulap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review based on just the first glance. AFAIK: we need HWY_ATTR for all the functions that use highway ops. From highway docs: each function that calls Highway ops (such as Load) must either be prefixed with HWY_ATTR...

[
AVX512_SKX, [AVX2, FMA3],
VSX4, VSX2,
VSX2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove VSX4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image During the build process on PPC64LE, the compiler command for the VSX4 dispatch target contains conflicting CPU flags. The build system seems to be appending the target-specific -mcpu=power10 flag without overriding the base -mcpu=power9 flag that was likely set for the baseline build. This results in a command with mutually exclusive options (-mcpu=power9 and -mcpu=power10), which causes the compiler to fail with a target specific option mismatch error during function inlining. Could you please confirm if this is a known issue or a bug in the Meson build configuration for ppc64le?Thanks !

Copy link
Contributor Author

@ixgbe ixgbe Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove VSX4?

Hi, @r-devulap
Before my modification, NPY_SIMD_F64 was used to control DOUBLE_tanh. For VSX4, the macro NPY_SIMD_F64 evaluates to 0, so DOUBLE_tanh was implemented using scalar code. Therefore, no compilation error occurred.

After my modification, I switched to using NPY_HWY_F64 to control DOUBLE_tanh. For VSX4, NPY_HWY_F64 evaluates to 1, so DOUBLE_tanh is implemented using vectorized code. However, when compiling on a Power9 system (with -mcpu=power9), calling hwy::N_PPC10 functions leads to a compilation error due to target-specific option mismatches.

Therefore, removing VSX4 from numpy/_core/meson.build results in behavior consistent with the original implementation before my change.
image
https://www.ibm.com/support/pages/node/7155156
CI use gcc11.4 now , however processor is Power9.

HWY_ATTR NPY_FINLINE vtype
load_vector(type_t* src, npy_intp ssrc, npy_intp len){
auto D = hn::DFromV<vtype>();
HWY_INLINE vtype load_vector(type_t* src, npy_intp ssrc, npy_intp len){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it require HWY_ATTR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

HWY_ATTR NPY_FINLINE void
store_vector(vtype vec, type_t* dst, npy_intp sdst, npy_intp len){
auto D = hn::DFromV<vtype>();
HWY_INLINE void store_vector(vtype vec, type_t* dst, npy_intp sdst, npy_intp len){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: don't we need HWY_ATTR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ixgbe ixgbe requested a review from r-devulap July 15, 2025 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants