-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: Fix from_float_positional
errors for huge pads
#28149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi, I'm a new contributor to NumPy and I would appreciate any guidance you may have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introducing the error return looks great. A few comments, mainly:
- It seems a lot easier to just inline the error. It would be fine to do a single error for everything, although I would probably inline the error anyway with a
goto buffer_too_small:
or so. - This fixes the error, but a lot of care needs to be taken about overflows unfortunately:
- Can be fixed with re-arranging some of the length calculations, another way might be to use
int64
for some of these checks. - There are probably a lot more errors here: Every time
maxPrintLen
is checked and a special path taken, the result returned is currently wrong in some way and we need to raise an/the error!
Would you be up to auditing that code as well?
- Can be fixed with re-arranging some of the length calculations, another way might be to use
For simplicity, I really suggest to just raise one single error as noted in one of the comments. In practice, no-one will ever see this error (if they did we should just support it...), so we can be lazy and worry about raising a particularly helpful error.
(I suppose the "no-one" should ever see this error makes me lean towards a RuntimeError
.)
numpy/_core/src/multiarray/dragon4.c
Outdated
/* handle overflow condition safely*/ | ||
if (NPY_TRUE == right_overflow) { | ||
buffer[pos] = '\0'; | ||
return RIGHT_PADDING_OVERFLOW; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is any reason to have multiple error returns here. You can just set the Python error here and return -1
as the only error value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, fixed
numpy/_core/arrayprint.py
Outdated
many characters are to the left of the decimal point. | ||
many characters are to the left of the decimal point. Total length of | ||
string representation of the floating point value cannot exceed 16384, | ||
including pad_right, if any. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this is a lot of detail, nobody should ever see. If we want it, maybe we should put it into the Exceptions
section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've simplified this part and moved to a separate section below. Is this what you've meant?
numpy/_core/src/multiarray/dragon4.c
Outdated
npy_int32 count = digits_right - numFractionDigits; | ||
|
||
/* in trim_mode DptZeros, if right padding, add a space for the . */ | ||
if (trim_mode == TrimMode_DptZeros && numFractionDigits == 0 | ||
&& pos < maxPrintLen) { | ||
buffer[pos++] = ' '; | ||
} | ||
|
||
|
||
/* provided digits_rights is too large, can't create the string required*/ | ||
if (pos + count > maxPrintLen) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this (and likely others) can still overflow for ridiculous inputs like 2**31-1
.
if (pos + count > maxPrintLen) { | |
if (count > maxPrintLen - pos) { |
re-arranging it should keep us safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I've fixed this where it's relevant.
numpy/_core/src/multiarray/dragon4.c
Outdated
if (pos + count > maxPrintLen) { | ||
right_overflow = NPY_TRUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why continue and not set the error and return here right away, we shouldn't use the result, so it doesn't matter if the buffer is null terminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed according to your suggestion
numpy/_core/src/multiarray/dragon4.c
Outdated
error_msg = "An unexpected error occurred";\ | ||
break;\ | ||
}\ | ||
PyErr_SetString(error_type, error_msg);\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we set the error above, this code can stay unchanged. Also Dragon4_PrintFloat_##format
is used below a second time and both must stay aligned.
We should make sure there is a test for both calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reverted the code for this function and and switched to raising errors in FormatPositional
@@ -305,6 +306,9 @@ def test_dragon4_interface(self): | |||
"1.2" if tp != np.float16 else "1.2002") | |||
assert_equal(fpos(tp('1.'), trim='-'), "1") | |||
assert_equal(fpos(tp('1.001'), precision=1, trim='-'), "1") | |||
|
|||
assert_raises_regex(ValueError, "Left padding exceeds buffer size of 16384", fpos, tp('1.047'), precision=2, pad_left=int(1e5)) | |||
assert_raises_regex(ValueError, "Right padding exceeds buffer size of 16384", fpos, tp('1.047'), precision=2, pad_right=int(1e5)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charris what is our preferred line length? It seems a bit that the new linter setup ignores line-length?
For here: Would be nice to break it. That is easiest if you use the nicer pattern of:
with pytest.raises(..., match="..."):
(there is a lot of code not using it, but that is mostly because it predates pytest.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've split it into several lines to match the expected line-length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg Preferred line length is now <= 88 for Python code. Clang-format is still at 80, but we should probably increase that. Linux is, IIRC, now <= 90 suggested, but not inforced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the thing is that the linter job should have complained: I guess it just means adding a specific line length somewhere.
Thanks @seberg for your review. I've implemented your suggestions. Is there anything else I should add? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! You missed one > maxPrintLen
re-order (I think there are no more, but you could check briefly). We need to fix that and also add tests for huge input values (please do try if it crashes without the last fix!).
Otherwise, a few smaller things, it would be nice if you can refactor the test a bit more at least.
Personally, I might not document the error at all, but don't care. Either way, no need to change (I might yet decide to remove it via the "suggests" before merging).
Anyway, almost good to go in, thanks a lot!
numpy/_core/src/multiarray/dragon4.c
Outdated
count = maxPrintLen - pos; | ||
/* integer part too long for buffer, avoid overflow */ | ||
if (count > maxPrintLen - pos) { | ||
PyErr_SetString(PyExc_RuntimeError, "String buffer overflow"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small suggestion to write something that is easier to understand for Python programmers. Maybe along: "float formating result too large." (in all places of course)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
numpy/_core/src/multiarray/dragon4.c
Outdated
@@ -1823,14 +1837,17 @@ FormatPositional(char *buffer, npy_uint32 bufferSize, BigInt *mantissa, | |||
if (digits_left > numWholeDigits + has_sign) { | |||
npy_int32 shift = digits_left - (numWholeDigits + has_sign); | |||
npy_int32 count = pos; | |||
|
|||
|
|||
/* provided digits_left is too large, can't create the string required*/ | |||
if (count + shift > maxPrintLen) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to above shift
is unbounded in size, so count + shift
might overflow. So rephrase as count > maPrintLen - shift
.
assert_raises_regex(RuntimeError, "String buffer overflow", fpos, | ||
tp('1.047'), precision=2, pad_left=int(1e5)) | ||
assert_raises_regex(RuntimeError, "String buffer overflow", fpos, | ||
tp('1.047'), precision=2, pad_right=int(1e5)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reflowing. There is one important thing to do, I think. And that is we should add a test for all of these with the padding set to np.iinfo("uint32").max
because this hits those count + size > maxPrintlen
overflows.
(Ideally you add the tests first and confirm that the test crashes NumPy!)
Then I have a few smaller asks/suggestions that would be great:
- I slightly prefer
10**5
rather thanint(1e5)
. - It would be much clearer to use
with pytest.raises(...):
- Very long tests aren't nice, it would be much better to split out the test into its own function. Ideally, you use
@pytest.mark.parametrize
instead of the loop for[np.float16, np.float32, np.float64]
.
(You should be able to find good examples for these if you search the NumPy code.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented all of your suggestions:
- I've refactored the test into several smaller tests
- I've parametrized on type
- I've switched to 10**5
- I've added
np.iinfo("int32").max
as a test value,instead of np.iinfo("uint32").max
because the unsigned value is too large to be passed as `npy_int32' and tests detect this and fail
numpy/_core/src/multiarray/dragon4.c
Outdated
@@ -1823,14 +1837,17 @@ FormatPositional(char *buffer, npy_uint32 bufferSize, BigInt *mantissa, | |||
if (digits_left > numWholeDigits + has_sign) { | |||
npy_int32 shift = digits_left - (numWholeDigits + has_sign); | |||
npy_int32 count = pos; | |||
|
|||
|
|||
/* provided digits_left is too large, can't create the string required*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these comments are needed here because with the Python error text, the code reason is just as clear already, IMO. So I would slightly prefer if you remove them again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, removed comment
56e83bd
to
fa0273d
Compare
After implementing all suggestions, I see that my PR fails one of the tests - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yakovdan. The Cygwin failure is indeed universal, I think so let's ignore it for now 🤞.
Thanks for cleaning the tests up a bit more than necessary even! Definitely a nice improvement.
Mostly think some of the comments are unnecessary, so will just remove them (and that doc note, just because I lean to it, but others may well disagree).
I think this is all done though, will just apply and merge soon or when the tests pass.
from_float_positional
errors for huge pads
This PR adds graceful error handling for np.format_float_positional when provided pad_left or pad_right arguments are too large. * TST: Added tests for correct handling of overflow * BUG: fixed pad_left and pad_right causing overflow if too large * TST: added overflow test and fixed formatting * BUG: fixed overflow checks and simplified error handling * BUG: rewritten excpetion message and fixed overflow check * TST: split test into smaller tests, added large input value * Apply suggestions from code review --------- Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>
This PR adds graceful error handling for np.format_float_positional when provided pad_left or pad_right arguments are too large. * TST: Added tests for correct handling of overflow * BUG: fixed pad_left and pad_right causing overflow if too large * TST: added overflow test and fixed formatting * BUG: fixed overflow checks and simplified error handling * BUG: rewritten excpetion message and fixed overflow check * TST: split test into smaller tests, added large input value * Apply suggestions from code review --------- Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net> Signed-off-by: Filipe Laíns <lains@riseup.net>
This PR adds graceful error handling for
np.format_float_positional
when providedpad_left
orpad_right
arguments are too large, leading to an overflow of the string buffer.In such cases, a
ValueError
exception is raised.Added tests and updated the doc to describe limits on
pad_left
andpad_right
fixes #28068