-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: Better report integer division overflow #21507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
956f56b
to
254dbd0
Compare
254dbd0
to
bacf182
Compare
This took longer than expected and got caught up with a bunch of stuff, apologies for the delay, will fix the small difference in behavior mentioned above soon. |
bacf182
to
e57c99b
Compare
Thanks again @seberg for the detailed help in this one. It's ready for review. Will just wait out the CI for windows in particular for good measure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good to me. I made a lot of silly comments, the only important points are:
1.I think should check for the 0 result for fmod/remainder.
2. The test failure is real, but not an actual problem. There is a #pragma
on @scalar@_richcompare
(or similar), that we probably need to copy to make things pass here as well.
Otherwise this looks ready to go in to me. I have to admit, I did not double check whether we may have other SIMD paths that can trigger the issue. I assume not, and if we do the tests should weed them out.
(The only thing I am wondering about is backporting tests if we are not sure whether they might fail.)
numpy/core/tests/test_umath.py
Outdated
# then, result will be a larger type than dividend and will not | ||
# result in an overflow for `divmod` and `floor_divide`. | ||
if np.dtype(dividend_dtype).itemsize >= np.dtype( | ||
divisor_dtype).itemsize and operation in ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just dividend_type.itemsize
, just to make it a bit shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dividend_dtype
is type yeah?
>>> np.int8.itemsize
<attribute 'itemsize' of 'numpy.generic' objects>
>>> np.int8.itemsize < np.int16.itemsize
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'getset_descriptor' and 'getset_descriptor'
>>> np.dtype(np.int8).itemsize
1
c662b9b
to
197d7c0
Compare
Oh I forgot the |
Seems like a rebase fixed the issue due to some other change. I think this is ready for review @seberg |
*out = 0; | ||
} | ||
return 0; | ||
return NPY_FPE_DIVIDEBYZERO; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the above be:
*out = 0;
if (b == 0) {
return NPY_FPE_DIVIDEBYZERO;
}
return 0;
Since this is a remainder? I am surprised the test wouldn't catch it though, so maybe I am misreading.
Other than this question, I think this is ready. Thanks for working on fixing this long-standing issue!
@@ -167,7 +174,7 @@ static NPY_INLINE int | |||
} | |||
#if @neg@ | |||
else if (b == -1 && a == NPY_MIN_@NAME@) { | |||
*out = a / b; | |||
*out = -NPY_MIN_@NAME@; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I'm just confused at this point as in arrays we do set it to 0
numpy/numpy/core/src/umath/loops_arithmetic.dispatch.c.src
Lines 293 to 295 in 5447192
if (b == 0 || (a == NPY_MIN_INT@len@ && b == -1)) { | |
npy_set_floatstatus_divbyzero(); | |
*dst1 = 0; |
But how are the tests passing for me locally, I need to see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we always do this? It seems like all of this is a bit of a mess, NPY_MIN_@NAME@
(in this particular case), does seem a bit more sensible to me, since it is the "correct" overflowing result. OTOH, I don't particularly want to change it if it was always different.
Yeah, it seems we should have taken a short break and map out the exact behavior a bit clearer earlier...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler warns on
numpy/core/src/umath/scalarmath.c.src:177:16: warning: integer overflow in expression ‘-2147483648’ of type ‘int’ results in ‘-2147483648’ [-Woverflow]
177 | *out = -NPY_MIN_@NAME@;
| ^
numpy/core/tests/test_umath.py
Outdated
TestDivisionOverflows.operator.floordiv): | ||
# FIXME: Raise the same error in case of scalars and arrays | ||
with pytest.raises( | ||
RuntimeWarning if operation is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very happy with this difference in behavior, but thought of following up as it might end up a bigger refactor? Ok with fixing now as well, let me know what's best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subtleties... Which errors are we giving? int_min / -1
(or all other operations), should probably give overflow warnings (rather than divide-by-zero), and I think this is what you see here?
But I am not sure if not all operations here should be overflow warnings/errors (which we should set to raise in the decorator, since we are testing them)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think RuntimeWarning
for underflow is the best error.
Now I think the reason for so much of different behaviors is because floor_div in SIMD is not handled in all loops. Your guess on
check whether we may have other SIMD paths that can trigger the issue. I assume not, and if we do the tests should weed them out.
is accurate. I'll fix all of them here and set the result to 0 making it uniform across.
I think we all can agree -MIN is definitely an overflow and lets change it to 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we all can agree -MIN is definitely an overflow and lets change it to 0
But the correct "rollover" result of -MIN
overflowing is -MIN
? So I think it is the better result, especially for -MIN
itself (and why would MIN/-1
be different?). MIN
is much more of a warning that something is wrong and it seems also like the "well defined" behavior (if the C-standard would define 2-complement signed integer overflow as defined rather than undefined)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok that does make sense. I forgot we do have a return value to indicate any errors as well. I'll go ahead and set it to -MIN
(in a way the compiler does not treat it as a warning)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we don't have it really, I guess. But I do feel it may be useful to users to get MIN
; 0 is just not a "weird" value :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha yeah agree on that. Yeah let's set it to MIN
and raise an Overflow
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to summarize how I current see things:
- For the "overflow" paths, I would argue for
MIN
as the correct result and would be happy if we change it if there are currently paths that return 0. - For the "division by zero" paths, I don't have a clear opinion. To some degree, I think
MIN
might be the better "error" value, but I think we had been consistent about returning 0, so I tend to think we should just keep doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to the proposal, I'll make the changes following this. It would be nice If we can officially standardize it somewhere :)
163f7a5
to
b9c27d5
Compare
abbc022
to
85f1b09
Compare
@@ -167,7 +174,7 @@ static NPY_INLINE int | |||
} | |||
#if @neg@ | |||
else if (b == -1 && a == NPY_MIN_@NAME@) { | |||
*out = a / b; | |||
*out = NPY_MIN_@NAME@; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok it seems like its not covered here as well which is weird, let me check. ref: #21648 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm it's hitting in my machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which test? Maybe somehow it is skipped in CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy/core/tests/test_umath.py::TestDivisionOverflows::test_overflows[floordiv-int8-int8]
and similar
https://github.com/numpy/numpy/pull/21507/files#diff-ddb5d5f0dc988a0ab77107bb335499fd1ae477f73343f244e3c4d039bdca5166R786 Line to be precise
// Divide by zero | ||
quo = npyv_select_@sfx@(bzero, vzero, quo); | ||
// Overflow | ||
quo = npyv_select_@sfx@(overflow, vmin, quo); | ||
npyv_store_@sfx@(dst1, quo); | ||
npyv_store_@sfx@(dst2, npyv_and_@sfx@(cvtozero, rem)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ganesh-k13 , see below one change needed for VSX/Power as the variable/vector cvtozero
was deleted in one of the commits.
One more thing: your tests are passing when I execute them on Power/VSX.
Thank you for applying the changes for Power/VSX as well :-) 👍 .
P.S.: I realized that the other cvtozero
that is used in the other function that implements the unsigned integer types could also be removed to reduce 2 SIMD instructions into just one (select+and --> select)
.
// Divide by zero | |
quo = npyv_select_@sfx@(bzero, vzero, quo); | |
// Overflow | |
quo = npyv_select_@sfx@(overflow, vmin, quo); | |
npyv_store_@sfx@(dst1, quo); | |
npyv_store_@sfx@(dst2, npyv_and_@sfx@(cvtozero, rem)); | |
// Divide by zero | |
quo = npyv_select_@sfx@(bzero, vzero, quo); | |
rem = npyv_select_@sfx@(bzero, vzero, rem); | |
// Overflow | |
quo = npyv_select_@sfx@(overflow, vmin, quo); | |
rem = npyv_select_@sfx@(overflow, vzero, rem); | |
npyv_store_@sfx@(dst1, quo); | |
npyv_store_@sfx@(dst2, rem); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @rafaelcfsousa! I have applied the changes to both signed and unsinged, can you take a look once please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for applying the changes!! 👍
There are two other small changes for Power/VSX as shown below.
- We also need the addition of the line
rem = npyv_select_@sfx@(bzero, vzero, rem);
after the line below as shown in the suggested change
box in my previous message (above).
quo = npyv_select_@sfx@(bzero, vzero, quo); |
It is required as we compute the rem of the operation divmod
using the equation below:
rem = a - (b * quo)
which means that we have to zero rem
for b
equal to zero, otherwise rem
results in a
.
- About the variable
cvtozero
, the changes worked perfectly for both signed and unsigned types. But I forgot to say that one other variable (vneg_one
) would become unused after removingcvtozero
, see below:
numpy/core/src/umath/loops_modulo.dispatch.c.src:184:19: warning: unused variable ‘vneg_one’ [-Wunused-variable]
184 | const npyv_@sfx@ vneg_one = npyv_setall_@sfx@(-1);
You can remove the variable vneg_one
defined at numpy/core/src/umath/loops_modulo.dispatch.c.src:184 (see below).
const npyv_@sfx@ vneg_one = npyv_setall_@sfx@(-1); |
That's all, thanks again for applying the changes for VSX4/Power10. :-) 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, so we perform the combined action of cvtozero
using the two selects. Thanks a lot Rafael! Really shooting in the dark here without the hardware.
Thanks, I took the liberty to add two pretty extensive tests (maybe we can use the pattern elsewhere too), but keep the old tests around. The reason is that the scalar path I think is covered (coverage is wrong, first time I saw that). But the other two paths should actually be covered (and should be now). However, I don't trust the vsx contig_scalar paths. @rafaelcfsousa could you have a look into it? I suppose as long as they don't crash it would already be pretty good, but maybe it is not too hard to get the warnings right. (I suspect we need special loops for the two special scalar values |
Hi @seberg, both build and tests (including the ones you just added in ac5a846) are working correctly on a Power10/VSX4 machine.
About the special cases in the VSX code:
|
Thanks for the quick check, had a small problem with the test, but I don't think that was the problem :). What I missed is that the contig+scalar path is disabled early on for the |
OK, the tests coverage only claims the scalar path is not hit (which does not add up). I am planning to merge this soon unless anyone has comments on my test additions. |
Does the change affect benchmarks or is the |
Hmmm, good point, it is a necessary fix for remainder (and maybe divmod) to fix a bug. The scalar paths, I would like the overflow warning, but if adding these 1-2 integer ops has noticeable slowdown, we could try removing the check (and warning) from the the other array paths. We don't typically give overflow warnings anyway. |
Thanks a lot for the tests @seberg and @rafaelcfsousa! Let me check the benchmarks to see if we have any slowdown. |
Not sure if this covers what we need? bench_ufunc.UFunc
|
It seems the benchmark changes to the division ufuncs are not significant. |
Having a quick look, I am not sure the benchmark actually check the integer functions well (focusing on float64). |
I thought of adding an integer path. But man is it complicated to understand the setup :). I wrote a custom piece: [EDIT] Added class UFuncInt(Benchmark):
params = [np.sctypes['int']]
param_names = ['dtype']
timeout = 10
def setup(self, dtype):
np.seterr(all='ignore')
low, high = np.iinfo(dtype).min, np.iinfo(dtype).max
self.arrays = [np.random.randint(low, high, size=i, dtype=dtype) for i in range(1, 10000)]
self.divisor = np.array([np.random.randint(low, high)], dtype=dtype)
def time_ufunc_int_divide(self, dtype):
for a in self.arrays:
np.floor_divide(a, self.divisor)
np.divmod(a, self.divisor)
np.fmod(a, self.divisor) and got:
But this seems definitely wrong though yeah? There has to be some sort of impact? I changed |
Probably best to split up the test by function and then limit it to a single run with a large array (10000 elements should be fine). Small arrays are not too interesting (from a dtype perspective), since ufunc overhead will dominate those. Do we have special paths for contiguous arrays and contig+scalar at all (except for VSX4?). If we do not have those, maybe the code is just not optimized enough that a difference can show up. |
I tried this: class UFuncInt(Benchmark):
params = [np.sctypes['int']]
param_names = ['dtype']
timeout = 10
def setup(self, dtype):
np.seterr(all='ignore')
low, high = np.iinfo(dtype).min, np.iinfo(dtype).max
self.array = np.random.randint(1, high, size=10000, dtype=dtype)
self.divisor = np.array([np.random.randint(1, high)], dtype=dtype)
def time_ufunc_int_divide(self, dtype):
np.floor_divide(self.array, self.divisor)
def time_ufunc_int_divmod(self, dtype):
np.divmod(self.array, self.divisor)
def time_ufunc_int_fmod(self, dtype):
np.fmod(self.array, self.divisor) And got the same
Yeah, I think this is a very good explanation. During our SIMD migration from libdivide(#18766), we implemented this paper. Now this gets us to about at least 20 instruction miminimum, barring other overheads. Now adding this |
Does this PR need more review? |
Thanks Ganesh, sorry for keeping this hanging so long. I just browesed through once more, but there was not anything to be done beyond checking that we don't have a big performance degradation. We should try to backport this and the related PR, and we discussed yesterday, that we should not worry about the tiny changes in behavior since this is extremely niche (niche enough that the bug was not noticed in practice). I will try to create backport PR with both very soon (maybe its all just applying this anyway). We should ping remember to ping |
Thanks a lot @seberg! I'll raise a backport PR right away, no worries. |
Thanks, if you have time for it that is great. But do not feel compelled! |
Changes
Related: #21506
Finishes: #19260