-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Add SIMD implementation for heaviside #19780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
eb5c1e6
to
a92de0c
Compare
c822aa6
to
b71e0d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize for my intrusion and for giving myself the right to interfere with your pull-request, but in the interest of your time and the general interest of NumPy I have corrected the path of your work.
We shouldn't implement new universal intrinsics unless:
- There's one or multiple of SIMD extensions that provide direct/indirect native hardware instructions for it.
- Commonly used, and consider it part of utilities.
And since Heaviside can be implemented directly via universal intrinsics and it doesn't fit one of the above points, therefore I think there's no need to be part of the interface, rather it should be moved directly to ufunc innerloop.
In the light of the above, I have initialized a new dispatchable source for general binary fp operations suitable for your heaviside implementation.
8bb49cd
to
ba39428
Compare
f980c66
to
9312d18
Compare
@seiko2plus I am wondering should I take |
038beb4
to
b8a3438
Compare
b8a3438
to
10fe327
Compare
I think this error caused by the scalar function, try to clear FP barrier by adding the following code after the scalar loop: npy_clear_floatstatus_barrier((char*)dimensions); regards to your current impl of heaviside, I think it need to get improved. Try something like that: const npyv_f32 zero = npyv_zero_f32();
const npyv_f32 one = npyv_setall_f32(1.0f);
npyv_b32 mask_nnan = npyv_notnan_f32(x);
npyv_b32 mask_zero = npyv_cmpeq_f32(x, zero);
// arithmetic right shift to the edge of the exponent
npyv_f32 shift_exp = npyv_reinterpret_f32_s32(
npyv_shri_s32(npyv_reinterpret_s32_f32(x), 8)
);
// x < 0 ? 0 : 1, since the fraction of 1.0 is zero.
// npyv_f32 zero_or_one = npyv_andc_f32(one, shift_exp); better to have intrin for `AND With Complement`, on x86 named `andnot`
npyv_f32 zero_or_one = npyv_and_f32(one, npyv_not_f32(shift_exp));
return npyv_select_f32(mask_zero, h, zero_or_one); |
@howjmay, try to use from numpy.core import _simd as simd
v = simd.baseline # for AVX2 use simd.FMA3__AVX2 or simd.targets['FMA3__AVX2']
print("the available targets ", simd.targets)
def heaviside_f32(x, h):
zero = v.zero_f32();
one = v.setall_f32(1.0);
mask_nnan = v.notnan_f32(x);
mask_zero = v.cmpeq_f32(x, zero);
# ....
# test your kernel
heaviside_f32(v.setall_f32(1), v.setall_f32(2)) |
No description provided.