-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
BUG, SIMD: Fix invalid value encountered in several ufuncs #22771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
351397f
to
65fa37e
Compare
981816d
to
267e02e
Compare
0b7a4fd
to
45b56f8
Compare
45b56f8
to
a540f50
Compare
a540f50
to
b5a4e2f
Compare
b5a4e2f
to
5bf0e44
Compare
The failure on ppc64le looks legit. |
To save some clicks: the error is here and is when checking that using |
Providing non-signaling comparison intrinsics that guarantee no FP invalid exception in case of qNaN sounds great but it cost unacceptable extra intrinsics on ppc64le(VSX) and x86(SSE). Therefore, an integer definition #NPY_SIMD_CMPSIGNAL has been provided instead to differenate between SIMD extensions that support only supports signaling comparison.
6d22737
to
d02fc70
Compare
The Power/ISA guide wasn't clear to me, I thought the legacy AltiVec FP comparison instructions are not going to raise invalid FP exceptions for quite nans. However, I discarded the whole patch that related to implementing non-signaling fp comparison, it was a bad approach from my side, see the last commit message for more clarification. |
close/open retrigger Travis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic looks fine, the new define also seems like a good choice to me. The rounding changes are hard to follow but the logic looks sound.
There are pretty extensive tests for floor
, ceil
, etc. in the SIMD tests, so I think this is good.
PS: I am not immediately sure how well weird corner cases are covered (especially for float32, like np.nextafter(2, 0)
), but it seems unlikely to be an issue. We also seem to not have any integration tests, but that would be feature creep here.
npyv_b@len@ nnan_mask = npyv_notnan_@sfx@(x); | ||
npyv_@sfx@ x_exnan = npyv_select_@sfx@(nnan_mask, x, npyv_setall_@sfx@(@default_val@)); | ||
npyv_@sfx@ out = __svml_@func@@func_suffix@(x_exnan); | ||
out = npyv_select_@sfx@(nnan_mask, out, x); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh an SVML issue, interesting...
// a if |a| >= 2^23 or a == NaN | ||
npyv_u32 mask = vcleq_f32(abs_x, two_power_23); | ||
mask = vandq_u32(mask, nnan_mask); | ||
return vbslq_f32(mask, floor, a); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic looks right (didn't try to look make sure the corner cases are, the tests surely do). I guess this approach to floor
is just a bit faster than then the casting approach (with the dance necessary to mask large values)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this approach to floor is just a bit faster than then the casting
To avoid testing finite then yes.
necessary to mask large values
To avoid divergences caused by sub/add
|
||
data_cmp = [func(a, b) for a, b in zip(data_a, data_b)] | ||
cmp = to_bool(intrin(vdata_a, vdata_b)) | ||
assert cmp == data_cmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice parametrizing this :).
)) | ||
def test_unary_spurious_fpexception(self, ufunc, dtype, data, escape): | ||
if escape and ufunc in escape: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be pytest.xfail()
or pytest.skip()
? That way there is a very small chance to notice it once day again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be pytest.xfail() or pytest.skip()?
It's normal to raise this kind of domain/invalid fp exception but if you mean the second case that related to float16 then yes.
Hmmmpf, tried to cancel and start the Travis CI ppc64le job, but right now it doesn't seem to help to get it started... |
close/reopen |
OK, CI ran through now, lets just put it in, can follow-up, but I doubt there is anything (except removing SSE2 without SSE41 to simplify maintanence ;)). Thanks Sayed! |
@seiko2plus we should to follow up on this for 1.24.1 since this is backported. I am getting these on M1:
Unless the fix is very simple, I suspect xfailing the test on apple is just as well, though. |
I need some time to think about the consequences of this move, SSE2 is still part of our default baseline, and dropping it will require dispatching each SIMD kernel.
I will look into it, it would be great if you could provide a build log or verbose test result at least. |
These are all the If you really want, here is the build log:
|
@seberg, I wasn't able to reproduce it, maybe the clang commits an aggressive optimization leads somehow to optimizing out the new changes: numpy/numpy/core/src/umath/loops_trigonometric.dispatch.c.src Lines 128 to 131 in d02fc70
or maybe your build was cached as I can see from your build log: NFO: customize UnixCCompiler using new_build_clib
INFO: CCompilerOpt.__init__[813] : load cache from file -> /Users/sebastianb/forks/numpy/build/temp.macosx-11.0-arm64-3.10/ccompiler_opt_cache_clib.py
INFO: CCompilerOpt.__init__[824] : hit the file cache
running build_ext
INFO: customize UnixCCompiler
INFO: customize UnixCCompiler using new_build_ext
INFO: CCompilerOpt.__init__[813] : load cache from file -> /Users/sebastianb/forks/numpy/build/temp.macosx-11.0-arm64-3.10/ccompiler_opt_cache_ext.py
INFO: CCompilerOpt.__init__[824] : hit the file cache
INFO: customize UnixCCompiler
WARN: #### ['arm64-apple-darwin20.0.0-clang', '-Wno-unused-result', '-Wsign-compare', '-Wunreachable-code', '-DNDEBUG', '-fwrapv', '-O2', '-Wall', '-fPIC', '-O2', '-isystem', '/opt/homebrew/Caskroom/mambaforge/base/include', '-arch', 'arm64', '-fPIC', '-O2', '-isystem', '/opt/homebrew/Caskroom/mambaforge/base/include', '-arch', 'arm64', '-ftree-vectorize', '-fPIC', '-fPIE', '-fstack-protector-strong', '-O2', '-pipe', '-isystem', '/opt/homebrew/Caskroom/mambaforge/base/include', '-D_FORTIFY_SOURCE=2', '-isystem', '/opt/homebrew/Caskroom/mambaforge/base/include'] #######
INFO: customize UnixCCompiler using new_build_ext Could you please provide a non-cached build log and version of clang? |
Hmmm, maybe the compiler got updated recently with Ventura? (I am seeing this on main branch, but I didn't think it would matter). build.log |
closes #22461, #22772, #22797
for more clarification check the linked issues above