-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
py: Fixes and test coverage for 64-bit big integer representations. #16953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
py: Fixes and test coverage for 64-bit big integer representations. #16953
Conversation
ec883a8
to
c6f3856
Compare
// and passes tests. | ||
|
||
#define MICROPY_LONGINT_IMPL (MICROPY_LONGINT_IMPL_LONGLONG) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yoctopuce suggested we test MICROPY_OBJ_REPR_C
and MICROPY_LONGINT_IMPL_LONGLONG
together, which makes sense - and I don't think we have CI coverage for the former at the moment.
@dpgeorge what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a very good idea to test MICROPY_OBJ_REPR_C
in CI.
It's enabled on esp8266 and stm32 Arduino boards, so it does get tested for a release. But would be great to test it in CI.
Enabling representation C in this new longlong variant is a good idea!
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #16953 +/- ##
=======================================
Coverage 98.54% 98.54%
=======================================
Files 169 169
Lines 21890 21896 +6
=======================================
+ Hits 21572 21578 +6
Misses 318 318 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Code size report:
|
0dd8885
to
5d8e0c1
Compare
The other thing that might be worth considering here is 64-bit arithmetic overflow, should this raise OverflowError? (Currently arithmetic operations return the overflow result, which is UB in C - but I think on all our architectures it will wrap.) |
f8acf08
to
c862fb1
Compare
This is a bit troubling, my local |
The code only fails when run through mpy-cross (test option
2^63 is not a valid positive When using |
Ah of course, thanks! |
c862fb1
to
aa0917b
Compare
I feel like this difference in behaviour is a good example of why we might consider changing the 64-bit arithmetic operations to raise on overflow. |
I'm totally with you |
This PR now includes arithmetic overflow checks for 64-bit bigints! |
Excellent, thank you ! |
py/objint_longlong.c
Outdated
@@ -25,6 +25,7 @@ | |||
* THE SOFTWARE. | |||
*/ | |||
|
|||
#include <errno.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a bit of a nuisance to need to include this header... it sometimes doesn't exist on embedded systems.
At the very least I'd suggest moving this include to within the MICROPY_LONGINT_IMPL == MICROPY_LONGINT_IMPL_LONGLONG
guard, so it's only included if needed (because this file will be compiled on all systems regardless of the config).
In the long run, would be best to remove dependency on errno
, but I'm not sure it's worth the effort given this config is likely not used much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved it down behind the preprocessor guard.
I did start writing a version that doesn't depend on strtoll, using the approach @yoctopuce suggested for mp_parse_num_integer()
to handle 64-bit. It has potential to work quite nicely, but it wasn't particularly simple either so I shelved it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my version, in case it helps:
const byte* restrict str = (const byte*)*str_;
const byte* restrict top = str + len;
long long int_val = 0;
for (; str < top; str++) {
// get next digit as a value
mp_uint_t dig = *str;
if ('0' <= dig && dig <= '9') {
dig -= '0';
} else if (dig == '_') {
continue;
} else {
dig |= 0x20; // make digit lower-case
if ('a' <= dig && dig <= 'z') {
dig -= 'a' - 10;
} else {
// unknown character
break;
}
}
if (dig >= (mp_uint_t)base) {
break;
}
// add next digit
int_val = int_val * base + dig;
}
// negate value if needed
if (neg) {
int_val = -int_val;
}
mp_obj_t result = mp_obj_new_int_from_ll(int_val);
*str_ = (char*)str;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got inspired to go and finish the version of this I was working on, have updated this PR with the result.
py/objint_longlong.c
Outdated
@@ -144,9 +148,22 @@ mp_obj_t mp_obj_int_unary_op(mp_unary_op_t op, mp_obj_t o_in) { | |||
} | |||
} | |||
|
|||
// 64-bit binary operations have overflow checking when supported by the compiler, | |||
// multiply is used in a few places | |||
#if __has_builtin(__builtin_mul_overflow) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be wrapped in #if defined __has_builtin
. See py/misc.h
and https://gcc.gnu.org/onlinedocs/cpp/_005f_005fhas_005fbuiltin.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or follow lib/cmsis/inc/cmsis_gcc.h
:
#ifndef __has_builtin
#define __has_builtin(x) (0)
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also hit that __has_builtin issue when integrating this commit in my dev branch, and as __has_builtin()
is sometime undefined on compilers that do have support for __builtin_*_overflow
, I ended up using this code:
#if defined(_MSC_VER)
#define ENABLE_INTSAFE_SIGNED_FUNCTIONS
#include <intsafe.h>
#define add_overflow(lhs,rhs,res) LongLongAdd(lhs, rhs, res)
#define sub_overflow(lhs,rhs,res) LongLongSub(lhs, rhs, res)
#define mul_overflow(lhs,rhs,res) LongLongMult(lhs,rhs,res)
#elif defined(__EMSCRIPTEN__ ) || (defined(__GNUC__) && __GNUC__ >= 5)
#define add_overflow(lhs,rhs,res) __builtin_add_overflow(lhs, rhs, res)
#define sub_overflow(lhs,rhs,res) __builtin_sub_overflow(lhs, rhs, res)
#define mul_overflow(lhs,rhs,res) __builtin_mul_overflow(lhs, rhs, res)
#endif
#if !defined(add_overflow)
#include <limits.h>
__inline bool add_overflow(long long lhs, long long rhs, long long *result) {
// Check for overflow
if ((rhs > 0) && (lhs > LLONG_MAX - rhs)) {
return 1; // Overflow
}
if ((rhs < 0) && (lhs < LLONG_MIN - rhs)) {
return 1; // Overflow
}
*result = lhs + rhs;
return 0; // No overflow
}
__inline bool sub_overflow(long long lhs, long long rhs, long long *result) {
// Check for overflow
if ((rhs > 0) && (lhs < LLONG_MIN + rhs)) {
return 1; // Overflow
}
if ((rhs < 0) && (lhs > LLONG_MAX + rhs)) {
return 1; // Overflow
}
*result = lhs - rhs;
return 0; // No overflow
}
__inline bool mul_overflow(long long lhs, long long rhs, long long * result)
{
if (lhs > 0) {
if (rhs > 0) {
if (lhs > (LLONG_MAX / rhs)) {
return 1; // Overflow
}
} else {
if (rhs < (LLONG_MIN / lhs)) {
return 1; // Overflow
}
}
} else {
if (rhs > 0) {
if (lhs < (LLONG_MIN / rhs)) {
return 1; // Overflow
}
} else {
if ((lhs != 0) && (rhs <= (LLONG_MAX / lhs))) {
return 1; // Overflow
}
}
}
*result = lhs * rhs;
return 0; // No overflow
}
#endif
My embedded platforms are all using GCC >= 5, so the bigger+slower inline code only ends up in OS-based test builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive! @dpgeorge do you think it's worth pulling any/all of this into MP, or stick with __has_builtin
and fallback to not checking overflow? (__has_builtin
is GCC 10 and up, first released 2020.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I've added the check for defined __has_builtin
already.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we rely only on __has_builtin
, we are going to significantly increase code size for all compiler version between 2015 ( GCC 5+) and 2020 (GCC 10+).
As __builtin_*_overflow
functions have been officially included in GCC 5, you might want to override the default "not supported" setting by checking (defined(__GNUC__) && __GNUC__ >= 5)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other option is to disable the overflow check on those compiler versions (which is what the earlier version of this PR did, although the latest version disables the overflow check for some options and keeps it for multiply due to integer parsing.)
@yoctopuce I don't suppose you have any measurement of the difference? There are some cases where the compiler can peephole optimise in a native overflow check from equivalent C code (but unsure if this is one of those cases). EDIT: Otherwise I will measure this difference next week.
These will run on all ports which support them, but importantly they'll also run on ports that don't support arbitrary precision but do support 64-bit long ints. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>
Signed-off-by: Angus Gratton <angus@redyak.com.au>
These tests cover the use of mp_obj_new_int_from_str_len when mp_parse_num_integer overflows the SMALLINT limit. Signed-off-by: Yoctopuce dev <dev@yoctopuce.com>
Leans on the test case added in the parent commit, as json.loads() seems to be a trivial way to cause this. The test case is moved to an '_int_64' variant so the json_loads.py test can still run on builds without large int support. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>
1988172
to
ffaff9e
Compare
Signed-off-by: Angus Gratton <angus@redyak.com.au>
This relies on errno unfortunately, but there's no other way to disambiguate MAX_LONGLONG and an out of range value (apart from manually re-implementing strtoll). Includes some test workarounds to account for things which now overflow: - uctypes_array_load_store test was failing already, now won't parse. - all the ffi_int tests contain 64-bit unsigned values, that won't parse as long long. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au> Signed-off-by: Angus Gratton <angus@redyak.com.au>
Relies on arbitrary precision math, so won't run on a port which has threads & limited bigint support. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au> Signed-off-by: Angus Gratton <angus@redyak.com.au>
The other performance tests run and pass with only 64-bit big integer support. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au> Signed-off-by: Angus Gratton <angus@redyak.com.au>
Enabled when supported via compiler builtins (i.e. gcc/clang). Also adds an error when shifting by a negative value. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>
7d0818d
to
02112a4
Compare
Will be used in a follow-up commit. Signed-off-by: Angus Gratton <angus@redyak.com.au>
Makes it compatible with the __builtin_mul_overflow() syntax, used in follow-up commit. Signed-off-by: Angus Gratton <angus@redyak.com.au>
If long integer support is 'long long' then mp_parse_num_integer() can parse to it directly instead of failing over from small int. One tricky case this correctly overflows on is things like int("9223372036854775808") which is one more than LLONG_MAX in decimal. No unit test case added for this as it's too hard to detect 64-bit long integer mode. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>
02112a4
to
8555e92
Compare
Summary
As pointed out by @yoctopuce in this discussion and #16932, there is poor test coverage for the optional 64-bit bigint representation and this representation has some bugs.
This PR aims to improve this:
int_64
equivalent.longlong
unix build variant that enables 64-bit long int mode. Mostly useful for CI testing.Testing
longlong
variant on unix, looked for failures.Trade-offs and Alternatives
errno
which is a bit fraught in embedded systems. We could drop this commit out and accept the current saturating behaviour instead. The only way to correctly detect overflow without usingstrtoll
anderrno
would be to add a dependency onsscanf
or to re-implementstrtoll
ourselves (@yoctopuce suggested that when building in 64-bit mode we compilemp_parse_num_integer
to parse 64-bit integers, which could be quite a neat solution.)