py: Fixes and test coverage for 64-bit big integer representations. #16953

projectgus · 2025-03-18T06:57:06Z

Summary

As pointed out by @yoctopuce in this discussion and #16932, there is poor test coverage for the optional 64-bit bigint representation and this representation has some bugs.

This PR aims to improve this:

Add test naming convention specifically for 64-bit big integers. These will run if there is any big integer support (either 64-bit or arbitrary precision).
Duplicate the basic bigint tests to add int_64 equivalent.
Fix bug where negative 64-bit integers were incorrectly parsed.
Fix bug where 64-bit integer parsing produced invalid results if the buffer wasn't null terminated and the byte after the buffer was a valid digit. Fixes Incorrect parse of large integers in LONGLONG mode #16932. This incorporates the tests submitted in tests/extmod/json_loads: Add test cases for LONGINT parse. #16931 which seem to be a reliable way to get a string buffer which fits this edge case. However, the new tests are moved to a separate file so that the json tests don't depend on bigint support.
Add a longlong unix build variant that enables 64-bit long int mode. Mostly useful for CI testing.
Fix saturating behaviour when 64-bit integer parsing overflows, now it fails instead. This needed further filtering of tests as the ffi_int tests all depend on parsing UINT64_MAX or similar. This happened to work before this check was added, I believe as they got cast back to uint64 when passing to the FFI interface.
Raise OverflowError if an arithmetic operation overflows 64-bit signed integer, at least on gcc & clang.

Testing

Ran test suite on the new longlong variant on unix, looked for failures.

Trade-offs and Alternatives

We could deprecate this mode instead of improving support for it, but it does seem useful in small systems.
Correctly detecting overflow depends on errno which is a bit fraught in embedded systems. We could drop this commit out and accept the current saturating behaviour instead. The only way to correctly detect overflow without using strtoll and errno would be to add a dependency on sscanf or to re-implement strtoll ourselves (@yoctopuce suggested that when building in 64-bit mode we compile mp_parse_num_integer to parse 64-bit integers, which could be quite a neat solution.)

py/objint_longlong.c

projectgus · 2025-03-18T07:02:32Z

ports/unix/variants/longlong/mpconfigvariant.h

+// and passes tests.
+
+#define MICROPY_LONGINT_IMPL           (MICROPY_LONGINT_IMPL_LONGLONG)
+


@yoctopuce suggested we test MICROPY_OBJ_REPR_C and MICROPY_LONGINT_IMPL_LONGLONG together, which makes sense - and I don't think we have CI coverage for the former at the moment.

@dpgeorge what do you think?

Yes, that's a very good idea to test MICROPY_OBJ_REPR_C in CI.

It's enabled on esp8266 and stm32 Arduino boards, so it does get tested for a release. But would be great to test it in CI.

Enabling representation C in this new longlong variant is a good idea!

codecov · 2025-03-18T07:04:19Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.54%. Comparing base (e53f262) to head (8555e92).
Report is 4 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #16953   +/-   ##
=======================================
  Coverage   98.54%   98.54%           
=======================================
  Files         169      169           
  Lines       21890    21896    +6     
=======================================
+ Hits        21572    21578    +6     
  Misses        318      318

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2025-03-18T07:07:59Z

Code size report:

   bare-arm:   +20 +0.035% 
minimal x86:   +25 +0.013% 
   unix x64:    -8 -0.001% standard
      stm32:   +28 +0.007% PYBV10
     mimxrt:   +24 +0.006% TEENSY40
        rp2:   +40 +0.004% RPI_PICO_W
       samd:   +24 +0.009% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    -5 -0.001% VIRT_RV32

projectgus · 2025-03-18T23:39:00Z

The only way to correctly detect overflow without using strtoll and errno would be to add a dependency on sscanf or to re-implement strtoll ourselves

The other thing that might be worth considering here is 64-bit arithmetic overflow, should this raise OverflowError? (Currently arithmetic operations return the overflow result, which is UB in C - but I think on all our architectures it will wrap.)

projectgus · 2025-03-19T00:46:40Z

This is a bit troubling, my local longlong builds pass tests but now the CI build is failing them. And I think the CI test run definitely got further than this before...

yoctopuce · 2025-03-19T06:48:13Z

This is a bit troubling, my local longlong builds pass tests but now the CI build is failing them. And I think the CI test run definitely got further than this before...

The code only fails when run through mpy-cross (test option --via-mpy)
The cause is the very last test, which is causing an undetected overflow:

x = 1 << 63

2^63 is not a valid positive long long, but it is silently interpreted as -2^63 when interpreted directly in MicroPython.

When using --via-mpy, mpy-cross uses mpz bigints and can therefore create valid a constant object with the true decimal representation of 2^63 (thanks to MICROPY_COMP_CONST_FOLDING). When the mpy file is loaded, the runtime engine fails to create the constant object and fails with the ValueError....

projectgus · 2025-03-19T07:13:04Z

The code only fails when run through mpy-cross (test option --via-mpy) The cause is the very last test, which is causing an undetected overflow:

Ah of course, thanks!

projectgus · 2025-03-19T07:15:29Z

2^63 is not a valid positive long long, but it is silently interpreted as -2^63 when interpreted directly in MicroPython.

I feel like this difference in behaviour is a good example of why we might consider changing the 64-bit arithmetic operations to raise on overflow.

yoctopuce · 2025-03-19T07:19:23Z

2^63 is not a valid positive long long, but it is silently interpreted as -2^63 when interpreted directly in MicroPython.

I feel like this difference in behaviour is a good example of why we might consider changing the 64-bit arithmetic operations to raise on overflow.

I'm totally with you

projectgus · 2025-03-26T02:40:25Z

This PR now includes arithmetic overflow checks for 64-bit bigints!

yoctopuce · 2025-03-26T05:40:28Z

This PR now includes arithmetic overflow checks for 64-bit bigints!

Excellent, thank you !

dpgeorge · 2025-05-02T03:22:53Z

py/objint_longlong.c

@@ -25,6 +25,7 @@
 * THE SOFTWARE.
 */

+#include <errno.h>


Yeah, it's a bit of a nuisance to need to include this header... it sometimes doesn't exist on embedded systems.

At the very least I'd suggest moving this include to within the MICROPY_LONGINT_IMPL == MICROPY_LONGINT_IMPL_LONGLONG guard, so it's only included if needed (because this file will be compiled on all systems regardless of the config).

In the long run, would be best to remove dependency on errno, but I'm not sure it's worth the effort given this config is likely not used much.

I've moved it down behind the preprocessor guard.

I did start writing a version that doesn't depend on strtoll, using the approach @yoctopuce suggested for mp_parse_num_integer() to handle 64-bit. It has potential to work quite nicely, but it wasn't particularly simple either so I shelved it for now.

Here is my version, in case it helps:

const byte* restrict str = (const byte*)*str_; const byte* restrict top = str + len; long long int_val = 0; for (; str < top; str++) { // get next digit as a value mp_uint_t dig = *str; if ('0' <= dig && dig <= '9') { dig -= '0'; } else if (dig == '_') { continue; } else { dig |= 0x20; // make digit lower-case if ('a' <= dig && dig <= 'z') { dig -= 'a' - 10; } else { // unknown character break; } } if (dig >= (mp_uint_t)base) { break; } // add next digit int_val = int_val * base + dig; } // negate value if needed if (neg) { int_val = -int_val; } mp_obj_t result = mp_obj_new_int_from_ll(int_val); *str_ = (char*)str;

I got inspired to go and finish the version of this I was working on, have updated this PR with the result.

dpgeorge · 2025-05-02T03:25:12Z

py/objint_longlong.c

@@ -144,9 +148,22 @@ mp_obj_t mp_obj_int_unary_op(mp_unary_op_t op, mp_obj_t o_in) {
    }
 }

+// 64-bit binary operations have overflow checking when supported by the compiler,
+// multiply is used in a few places
+#if __has_builtin(__builtin_mul_overflow)


I think this needs to be wrapped in #if defined __has_builtin. See py/misc.h and https://gcc.gnu.org/onlinedocs/cpp/_005f_005fhas_005fbuiltin.html

Or follow lib/cmsis/inc/cmsis_gcc.h:

#ifndef __has_builtin #define __has_builtin(x) (0) #endif

I also hit that __has_builtin issue when integrating this commit in my dev branch, and as __has_builtin() is sometime undefined on compilers that do have support for __builtin_*_overflow, I ended up using this code:

#if defined(_MSC_VER) #define ENABLE_INTSAFE_SIGNED_FUNCTIONS #include <intsafe.h> #define add_overflow(lhs,rhs,res) LongLongAdd(lhs, rhs, res) #define sub_overflow(lhs,rhs,res) LongLongSub(lhs, rhs, res) #define mul_overflow(lhs,rhs,res) LongLongMult(lhs,rhs,res) #elif defined(__EMSCRIPTEN__ ) || (defined(__GNUC__) && __GNUC__ >= 5) #define add_overflow(lhs,rhs,res) __builtin_add_overflow(lhs, rhs, res) #define sub_overflow(lhs,rhs,res) __builtin_sub_overflow(lhs, rhs, res) #define mul_overflow(lhs,rhs,res) __builtin_mul_overflow(lhs, rhs, res) #endif #if !defined(add_overflow) #include <limits.h> __inline bool add_overflow(long long lhs, long long rhs, long long *result) { // Check for overflow if ((rhs > 0) && (lhs > LLONG_MAX - rhs)) { return 1; // Overflow } if ((rhs < 0) && (lhs < LLONG_MIN - rhs)) { return 1; // Overflow } *result = lhs + rhs; return 0; // No overflow } __inline bool sub_overflow(long long lhs, long long rhs, long long *result) { // Check for overflow if ((rhs > 0) && (lhs < LLONG_MIN + rhs)) { return 1; // Overflow } if ((rhs < 0) && (lhs > LLONG_MAX + rhs)) { return 1; // Overflow } *result = lhs - rhs; return 0; // No overflow } __inline bool mul_overflow(long long lhs, long long rhs, long long * result) { if (lhs > 0) { if (rhs > 0) { if (lhs > (LLONG_MAX / rhs)) { return 1; // Overflow } } else { if (rhs < (LLONG_MIN / lhs)) { return 1; // Overflow } } } else { if (rhs > 0) { if (lhs < (LLONG_MIN / rhs)) { return 1; // Overflow } } else { if ((lhs != 0) && (rhs <= (LLONG_MAX / lhs))) { return 1; // Overflow } } } *result = lhs * rhs; return 0; // No overflow } #endif

My embedded platforms are all using GCC >= 5, so the bigger+slower inline code only ends up in OS-based test builds.

Impressive! @dpgeorge do you think it's worth pulling any/all of this into MP, or stick with __has_builtin and fallback to not checking overflow? (__has_builtin is GCC 10 and up, first released 2020.)

(I've added the check for defined __has_builtin already.)

If we rely only on __has_builtin, we are going to significantly increase code size for all compiler version between 2015 ( GCC 5+) and 2020 (GCC 10+).

As __builtin_*_overflow functions have been officially included in GCC 5, you might want to override the default "not supported" setting by checking (defined(__GNUC__) && __GNUC__ >= 5)

The other option is to disable the overflow check on those compiler versions (which is what the earlier version of this PR did, although the latest version disables the overflow check for some options and keeps it for multiply due to integer parsing.)

@yoctopuce I don't suppose you have any measurement of the difference? There are some cases where the compiler can peephole optimise in a native overflow check from equivalent C code (but unsure if this is one of those cases). EDIT: Otherwise I will measure this difference next week.

py/objint_longlong.c

tools/ci.sh

These will run on all ports which support them, but importantly they'll also run on ports that don't support arbitrary precision but do support 64-bit long ints. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>

Signed-off-by: Angus Gratton <angus@redyak.com.au>

These tests cover the use of mp_obj_new_int_from_str_len when mp_parse_num_integer overflows the SMALLINT limit. Signed-off-by: Yoctopuce dev <dev@yoctopuce.com>

Leans on the test case added in the parent commit, as json.loads() seems to be a trivial way to cause this. The test case is moved to an '_int_64' variant so the json_loads.py test can still run on builds without large int support. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>

Signed-off-by: Angus Gratton <angus@redyak.com.au>

This relies on errno unfortunately, but there's no other way to disambiguate MAX_LONGLONG and an out of range value (apart from manually re-implementing strtoll). Includes some test workarounds to account for things which now overflow: - uctypes_array_load_store test was failing already, now won't parse. - all the ffi_int tests contain 64-bit unsigned values, that won't parse as long long. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au> Signed-off-by: Angus Gratton <angus@redyak.com.au>

Relies on arbitrary precision math, so won't run on a port which has threads & limited bigint support. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au> Signed-off-by: Angus Gratton <angus@redyak.com.au>

The other performance tests run and pass with only 64-bit big integer support. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au> Signed-off-by: Angus Gratton <angus@redyak.com.au>

Enabled when supported via compiler builtins (i.e. gcc/clang). Also adds an error when shifting by a negative value. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>

Will be used in a follow-up commit. Signed-off-by: Angus Gratton <angus@redyak.com.au>

Makes it compatible with the __builtin_mul_overflow() syntax, used in follow-up commit. Signed-off-by: Angus Gratton <angus@redyak.com.au>

If long integer support is 'long long' then mp_parse_num_integer() can parse to it directly instead of failing over from small int. One tricky case this correctly overflows on is things like int("9223372036854775808") which is one more than LLONG_MAX in decimal. No unit test case added for this as it's too hard to detect 64-bit long integer mode. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>

projectgus added the py-core Relates to py/ directory in source label Mar 18, 2025

projectgus commented Mar 18, 2025

View reviewed changes

py/objint_longlong.c Outdated Show resolved Hide resolved

projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch from ec883a8 to c6f3856 Compare March 18, 2025 06:59

projectgus commented Mar 18, 2025

View reviewed changes

projectgus mentioned this pull request Mar 18, 2025

tests/extmod/json_loads: Add test cases for LONGINT parse. #16931

Open

projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch 2 times, most recently from 0dd8885 to 5d8e0c1 Compare March 18, 2025 23:36

projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch 2 times, most recently from f8acf08 to c862fb1 Compare March 19, 2025 00:22

projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch from c862fb1 to aa0917b Compare March 19, 2025 07:14

projectgus added this to the release-1.26.0 milestone Mar 26, 2025

dpgeorge reviewed May 2, 2025

View reviewed changes

py/objint_longlong.c Outdated Show resolved Hide resolved

dpgeorge reviewed May 2, 2025

View reviewed changes

tools/ci.sh Outdated Show resolved Hide resolved

projectgus added 4 commits May 2, 2025 14:38

py/objint_longlong: Support parsing negative 64-bit integers.

9d05b7d

Signed-off-by: Angus Gratton <angus@redyak.com.au>

tests/extmod/json_loads: Add test cases for LONGINT parse.

ac37e77

These tests cover the use of mp_obj_new_int_from_str_len when mp_parse_num_integer overflows the SMALLINT limit. Signed-off-by: Yoctopuce dev <dev@yoctopuce.com>

projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch from 1988172 to ffaff9e Compare May 2, 2025 04:38

projectgus added 5 commits May 2, 2025 14:59

unix,tools/ci: Add a 'longlong' variant to test 64-bit bigints in CI.

a0aa177

Signed-off-by: Angus Gratton <angus@redyak.com.au>

py/objint_longlong: Add arithmetic overflow checks.

4fa5402

Enabled when supported via compiler builtins (i.e. gcc/clang). Also adds an error when shifting by a negative value. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>

projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch 2 times, most recently from 7d0818d to 02112a4 Compare May 2, 2025 06:43

projectgus added 3 commits May 2, 2025 16:47

py/objint_longlong: Move overflow checked long long multiply to misc.h.

4961137

Will be used in a follow-up commit. Signed-off-by: Angus Gratton <angus@redyak.com.au>

py/smallint: Update mp_small_int_mul_overflow() to perform the multiply.

d21c410

Makes it compatible with the __builtin_mul_overflow() syntax, used in follow-up commit. Signed-off-by: Angus Gratton <angus@redyak.com.au>

projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch from 02112a4 to 8555e92 Compare May 2, 2025 06:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

py: Fixes and test coverage for 64-bit big integer representations. #16953

py: Fixes and test coverage for 64-bit big integer representations. #16953

projectgus commented Mar 18, 2025 •

edited

Loading

projectgus Mar 18, 2025

dpgeorge May 2, 2025

codecov bot commented Mar 18, 2025 •

edited

Loading

github-actions bot commented Mar 18, 2025 •

edited

Loading

projectgus commented Mar 18, 2025 •

edited

Loading

projectgus commented Mar 19, 2025 •

edited

Loading

yoctopuce commented Mar 19, 2025 •

edited

Loading

projectgus commented Mar 19, 2025

projectgus commented Mar 19, 2025

yoctopuce commented Mar 19, 2025

projectgus commented Mar 26, 2025

yoctopuce commented Mar 26, 2025

dpgeorge May 2, 2025

projectgus May 2, 2025

yoctopuce May 2, 2025

projectgus May 2, 2025

dpgeorge May 2, 2025

dpgeorge May 2, 2025

yoctopuce May 2, 2025

projectgus May 2, 2025

projectgus May 2, 2025

yoctopuce May 2, 2025

projectgus May 4, 2025 •

edited

Loading

		// and passes tests.

		#define MICROPY_LONGINT_IMPL (MICROPY_LONGINT_IMPL_LONGLONG)

py: Fixes and test coverage for 64-bit big integer representations. #16953

Are you sure you want to change the base?

py: Fixes and test coverage for 64-bit big integer representations. #16953

Conversation

projectgus commented Mar 18, 2025 • edited Loading

Summary

Testing

Trade-offs and Alternatives

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Mar 18, 2025 • edited Loading

Codecov Report

github-actions bot commented Mar 18, 2025 • edited Loading

projectgus commented Mar 18, 2025 • edited Loading

projectgus commented Mar 19, 2025 • edited Loading

yoctopuce commented Mar 19, 2025 • edited Loading

projectgus commented Mar 19, 2025

projectgus commented Mar 19, 2025

yoctopuce commented Mar 19, 2025

projectgus commented Mar 26, 2025

yoctopuce commented Mar 26, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

projectgus May 4, 2025 • edited Loading

Choose a reason for hiding this comment

projectgus commented Mar 18, 2025 •

edited

Loading

codecov bot commented Mar 18, 2025 •

edited

Loading

github-actions bot commented Mar 18, 2025 •

edited

Loading

projectgus commented Mar 18, 2025 •

edited

Loading

projectgus commented Mar 19, 2025 •

edited

Loading

yoctopuce commented Mar 19, 2025 •

edited

Loading

projectgus May 4, 2025 •

edited

Loading