ENH: Add SIMD implementation for heaviside #19780

howjmay · 2021-08-29T10:31:56Z

No description provided.

seiko2plus

I apologize for my intrusion and for giving myself the right to interfere with your pull-request, but in the interest of your time and the general interest of NumPy I have corrected the path of your work.

We shouldn't implement new universal intrinsics unless:

There's one or multiple of SIMD extensions that provide direct/indirect native hardware instructions for it.
Commonly used, and consider it part of utilities.

And since Heaviside can be implemented directly via universal intrinsics and it doesn't fit one of the above points, therefore I think there's no need to be part of the interface, rather it should be moved directly to ufunc innerloop.

In the light of the above, I have initialized a new dispatchable source for general binary fp operations suitable for your heaviside implementation.

numpy/core/src/umath/loops_binary_fp.dispatch.c.src

…erations such heaviside.

howjmay · 2021-10-02T09:21:04Z

@seiko2plus I am wondering should I take NaN and Inf into account? In the current implementation, I ensure the output of a NaN input is NaN. However, in the tests, I got error RuntimeWarning: invalid value encountered in heaviside.
Should we reject NaN and Inf as input value?

seiko2plus · 2021-11-03T15:36:57Z

@howjmay,

I am wondering should I take NaN and Inf into account?
yes, we should and your final implementation should be equivalent to the scalar function:

numpy/numpy/core/src/npymath/npy_math_internal.h.src

Lines 580 to 600 in fae6fa4

/**begin repeat

* #type = npy_float, npy_double, npy_longdouble#

* #c = f, ,l#

* #C = F, ,L#

*/

@type@ npy_heaviside@c@(@type@ x, @type@ h0)

{

if (npy_isnan(x)) {

return (@type@) NPY_NAN;

}

else if (x == 0) {

return h0;

}

else if (x < 0) {

return (@type@) 0.0;

}

else {

return (@type@) 1.0;

}

}

I got error RuntimeWarning: invalid value encountered in heaviside.
Should we reject NaN and Inf as input value?

I think this error caused by the scalar function, try to clear FP barrier by adding the following code after the scalar loop:

    npy_clear_floatstatus_barrier((char*)dimensions);

regards to your current impl of heaviside, I think it need to get improved. Try something like that:

    const npyv_f32 zero = npyv_zero_f32();
    const npyv_f32 one  = npyv_setall_f32(1.0f);
    npyv_b32 mask_nnan = npyv_notnan_f32(x);
    npyv_b32 mask_zero = npyv_cmpeq_f32(x, zero);
    // arithmetic right shift to the edge of the exponent
    npyv_f32 shift_exp  = npyv_reinterpret_f32_s32(
        npyv_shri_s32(npyv_reinterpret_s32_f32(x), 8)
    );
    // x < 0 ? 0 : 1, since the fraction of 1.0 is zero.
    // npyv_f32 zero_or_one = npyv_andc_f32(one, shift_exp); better to have intrin for `AND With Complement`, on x86 named `andnot`
    npyv_f32 zero_or_one = npyv_and_f32(one, npyv_not_f32(shift_exp));
    return npyv_select_f32(mask_zero, h, zero_or_one);

seiko2plus · 2021-11-03T15:48:56Z

@howjmay, try to use _simd module during the design stage, it can your reduce time/efforts.

from numpy.core import _simd as simd
v = simd.baseline # for AVX2 use simd.FMA3__AVX2 or simd.targets['FMA3__AVX2']
print("the available targets ", simd.targets)

def heaviside_f32(x, h):
   zero = v.zero_f32();
   one  = v.setall_f32(1.0);
   mask_nnan = v.notnan_f32(x);
   mask_zero = v.cmpeq_f32(x, zero);
   # ....
# test your kernel
heaviside_f32(v.setall_f32(1), v.setall_f32(2))

github-actions bot added the 01 - Enhancement label Aug 29, 2021

howjmay force-pushed the simd-heaviside branch 29 times, most recently from eb5c1e6 to a92de0c Compare August 30, 2021 08:26

howjmay marked this pull request as draft August 30, 2021 17:37

howjmay force-pushed the simd-heaviside branch 2 times, most recently from c822aa6 to b71e0d2 Compare September 1, 2021 13:42

seiko2plus requested changes Sep 4, 2021

View reviewed changes

seiko2plus reviewed Sep 4, 2021

View reviewed changes

numpy/core/src/umath/loops_binary_fp.dispatch.c.src Outdated Show resolved Hide resolved

seiko2plus added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Sep 4, 2021

howjmay force-pushed the simd-heaviside branch from 8bb49cd to ba39428 Compare September 29, 2021 16:55

howjmay and others added 3 commits September 30, 2021 00:56

ENH: Add SIMD implementation for heaviside

6864da9

ENH: Use cmp intrinsics

9e4b7bc

ENH, SIMD: Prepare a new dispatchable source for general binary fp op…

3210b13

…erations such heaviside.

howjmay force-pushed the simd-heaviside branch 10 times, most recently from f980c66 to 9312d18 Compare October 2, 2021 08:51

howjmay force-pushed the simd-heaviside branch 3 times, most recently from 038beb4 to b8a3438 Compare October 2, 2021 09:31

ENH, SIMD: Add heaviside in dispatchable source

10fe327

howjmay force-pushed the simd-heaviside branch from b8a3438 to 10fe327 Compare October 2, 2021 09:32

seiko2plus mentioned this pull request Nov 3, 2021

ENH: Add SIMD operation copysign #19770

Closed

howjmay closed this Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Add SIMD implementation for heaviside #19780

ENH: Add SIMD implementation for heaviside #19780

Uh oh!

howjmay commented Aug 29, 2021

Uh oh!

seiko2plus left a comment

Uh oh!

Uh oh!

howjmay commented Oct 2, 2021 •

edited

Loading

Uh oh!

seiko2plus commented Nov 3, 2021

Uh oh!

seiko2plus commented Nov 3, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

ENH: Add SIMD implementation for heaviside #19780

ENH: Add SIMD implementation for heaviside #19780

Uh oh!

Conversation

howjmay commented Aug 29, 2021

Uh oh!

seiko2plus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

howjmay commented Oct 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seiko2plus commented Nov 3, 2021

Uh oh!

seiko2plus commented Nov 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

howjmay commented Oct 2, 2021 •

edited

Loading

seiko2plus commented Nov 3, 2021 •

edited

Loading