Skip to content

ENH: Add SIMD implementation for heaviside #19780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

howjmay
Copy link
Contributor

@howjmay howjmay commented Aug 29, 2021

No description provided.

@howjmay howjmay force-pushed the simd-heaviside branch 29 times, most recently from eb5c1e6 to a92de0c Compare August 30, 2021 08:26
@howjmay howjmay marked this pull request as draft August 30, 2021 17:37
@howjmay howjmay force-pushed the simd-heaviside branch 2 times, most recently from c822aa6 to b71e0d2 Compare September 1, 2021 13:42
Copy link
Member

@seiko2plus seiko2plus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for my intrusion and for giving myself the right to interfere with your pull-request, but in the interest of your time and the general interest of NumPy I have corrected the path of your work.

We shouldn't implement new universal intrinsics unless:

  • There's one or multiple of SIMD extensions that provide direct/indirect native hardware instructions for it.
  • Commonly used, and consider it part of utilities.

And since Heaviside can be implemented directly via universal intrinsics and it doesn't fit one of the above points, therefore I think there's no need to be part of the interface, rather it should be moved directly to ufunc innerloop.

In the light of the above, I have initialized a new dispatchable source for general binary fp operations suitable for your heaviside implementation.

@seiko2plus seiko2plus added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Sep 4, 2021
@howjmay howjmay force-pushed the simd-heaviside branch 10 times, most recently from f980c66 to 9312d18 Compare October 2, 2021 08:51
@howjmay
Copy link
Contributor Author

howjmay commented Oct 2, 2021

@seiko2plus I am wondering should I take NaN and Inf into account? In the current implementation, I ensure the output of a NaN input is NaN. However, in the tests, I got error RuntimeWarning: invalid value encountered in heaviside.
Should we reject NaN and Inf as input value?

@howjmay howjmay force-pushed the simd-heaviside branch 3 times, most recently from 038beb4 to b8a3438 Compare October 2, 2021 09:31
@seiko2plus
Copy link
Member

@howjmay,

I am wondering should I take NaN and Inf into account?
yes, we should and your final implementation should be equivalent to the scalar function:

/**begin repeat
* #type = npy_float, npy_double, npy_longdouble#
* #c = f, ,l#
* #C = F, ,L#
*/
@type@ npy_heaviside@c@(@type@ x, @type@ h0)
{
if (npy_isnan(x)) {
return (@type@) NPY_NAN;
}
else if (x == 0) {
return h0;
}
else if (x < 0) {
return (@type@) 0.0;
}
else {
return (@type@) 1.0;
}
}

I got error RuntimeWarning: invalid value encountered in heaviside.
Should we reject NaN and Inf as input value?

I think this error caused by the scalar function, try to clear FP barrier by adding the following code after the scalar loop:

    npy_clear_floatstatus_barrier((char*)dimensions);

regards to your current impl of heaviside, I think it need to get improved. Try something like that:

    const npyv_f32 zero = npyv_zero_f32();
    const npyv_f32 one  = npyv_setall_f32(1.0f);
    npyv_b32 mask_nnan = npyv_notnan_f32(x);
    npyv_b32 mask_zero = npyv_cmpeq_f32(x, zero);
    // arithmetic right shift to the edge of the exponent
    npyv_f32 shift_exp  = npyv_reinterpret_f32_s32(
        npyv_shri_s32(npyv_reinterpret_s32_f32(x), 8)
    );
    // x < 0 ? 0 : 1, since the fraction of 1.0 is zero.
    // npyv_f32 zero_or_one = npyv_andc_f32(one, shift_exp); better to have intrin for `AND With Complement`, on x86 named `andnot`
    npyv_f32 zero_or_one = npyv_and_f32(one, npyv_not_f32(shift_exp));
    return npyv_select_f32(mask_zero, h, zero_or_one);

@seiko2plus
Copy link
Member

seiko2plus commented Nov 3, 2021

@howjmay, try to use _simd module during the design stage, it can your reduce time/efforts.

from numpy.core import _simd as simd
v = simd.baseline # for AVX2 use simd.FMA3__AVX2 or simd.targets['FMA3__AVX2']
print("the available targets ", simd.targets)

def heaviside_f32(x, h):
   zero = v.zero_f32();
   one  = v.setall_f32(1.0);
   mask_nnan = v.notnan_f32(x);
   mask_zero = v.cmpeq_f32(x, zero);
   # ....
# test your kernel
heaviside_f32(v.setall_f32(1), v.setall_f32(2))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants