py: Fixes and test coverage for 64-bit big integer representations. #16953

projectgus · 2025-03-18T06:57:06Z

Summary

As pointed out by @yoctopuce in this discussion and #16932, there is poor test coverage for the optional 64-bit bigint representation and this representation has some bugs.

This PR aims to improve this:

Add test naming convention specifically for 64-bit big integers. These will run if there is any big integer support (either 64-bit or arbitrary precision).
Duplicate the basic bigint tests to add int_64 equivalent.
Fix bug where negative 64-bit integers were incorrectly parsed.
Fix bug where 64-bit integer parsing produced invalid results if the buffer wasn't null terminated and the byte after the buffer was a valid digit. Fixes Incorrect parse of large integers in LONGLONG mode #16932. This incorporates the tests submitted in tests/extmod/json_loads: Add test cases for LONGINT parse. #16931 which seem to be a reliable way to get a string buffer which fits this edge case. However, the new tests are moved to a separate file so that the json tests don't depend on bigint support.
Add a longlong unix build variant that enables 64-bit long int mode. Mostly useful for CI testing.
Fix saturating behaviour when 64-bit integer parsing overflows, now it fails instead. This needed further filtering of tests as the ffi_int tests all depend on parsing UINT64_MAX or similar. This happened to work before this check was added, I believe as they got cast back to uint64 when passing to the FFI interface.
Raise OverflowError if an arithmetic operation overflows 64-bit signed integer, at least on gcc & clang.

Testing

Ran test suite on the new longlong variant on unix, looked for failures.

Trade-offs and Alternatives

We could deprecate this mode instead of improving support for it, but it does seem useful in small systems.
Correctly detecting overflow depends on errno which is a bit fraught in embedded systems. We could drop this commit out and accept the current saturating behaviour instead. The only way to correctly detect overflow without using strtoll and errno would be to add a dependency on sscanf or to re-implement strtoll ourselves (@yoctopuce suggested that when building in 64-bit mode we compile mp_parse_num_integer to parse 64-bit integers, which could be quite a neat solution.)

py/objint_longlong.c

projectgus · 2025-03-18T07:02:32Z

ports/unix/variants/longlong/mpconfigvariant.h

+// and passes tests.
+
+#define MICROPY_LONGINT_IMPL           (MICROPY_LONGINT_IMPL_LONGLONG)
+


@yoctopuce suggested we test MICROPY_OBJ_REPR_C and MICROPY_LONGINT_IMPL_LONGLONG together, which makes sense - and I don't think we have CI coverage for the former at the moment.

@dpgeorge what do you think?

Yes, that's a very good idea to test MICROPY_OBJ_REPR_C in CI.

It's enabled on esp8266 and stm32 Arduino boards, so it does get tested for a release. But would be great to test it in CI.

Enabling representation C in this new longlong variant is a good idea!

MICROPY_OBJ_REPR_C seems to need more work than just enabling it for this config, so I'll look at that for a follow-up PR.

codecov · 2025-03-18T07:04:19Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.54%. Comparing base (45e4deb) to head (fd40fb1).

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #16953   +/-   ##
=======================================
  Coverage   98.54%   98.54%           
=======================================
  Files         169      169           
  Lines       21898    21904    +6     
=======================================
+ Hits        21580    21586    +6     
  Misses        318      318

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2025-03-18T07:07:59Z

Code size report:

   bare-arm:   +20 +0.035% 
minimal x86:   +25 +0.013% 
   unix x64:    -8 -0.001% standard
      stm32:   +28 +0.007% PYBV10
     mimxrt:   +24 +0.006% TEENSY40
        rp2:   +40 +0.004% RPI_PICO_W
       samd:   +24 +0.009% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    -5 -0.001% VIRT_RV32

projectgus · 2025-03-18T23:39:00Z

The only way to correctly detect overflow without using strtoll and errno would be to add a dependency on sscanf or to re-implement strtoll ourselves

The other thing that might be worth considering here is 64-bit arithmetic overflow, should this raise OverflowError? (Currently arithmetic operations return the overflow result, which is UB in C - but I think on all our architectures it will wrap.)

projectgus · 2025-03-19T00:46:40Z

This is a bit troubling, my local longlong builds pass tests but now the CI build is failing them. And I think the CI test run definitely got further than this before...

yoctopuce · 2025-03-19T06:48:13Z

This is a bit troubling, my local longlong builds pass tests but now the CI build is failing them. And I think the CI test run definitely got further than this before...

The code only fails when run through mpy-cross (test option --via-mpy)
The cause is the very last test, which is causing an undetected overflow:

x = 1 << 63

2^63 is not a valid positive long long, but it is silently interpreted as -2^63 when interpreted directly in MicroPython.

When using --via-mpy, mpy-cross uses mpz bigints and can therefore create valid a constant object with the true decimal representation of 2^63 (thanks to MICROPY_COMP_CONST_FOLDING). When the mpy file is loaded, the runtime engine fails to create the constant object and fails with the ValueError....

projectgus · 2025-03-19T07:13:04Z

The code only fails when run through mpy-cross (test option --via-mpy) The cause is the very last test, which is causing an undetected overflow:

Ah of course, thanks!

projectgus · 2025-03-19T07:15:29Z

2^63 is not a valid positive long long, but it is silently interpreted as -2^63 when interpreted directly in MicroPython.

I feel like this difference in behaviour is a good example of why we might consider changing the 64-bit arithmetic operations to raise on overflow.

yoctopuce · 2025-03-19T07:19:23Z

2^63 is not a valid positive long long, but it is silently interpreted as -2^63 when interpreted directly in MicroPython.

I feel like this difference in behaviour is a good example of why we might consider changing the 64-bit arithmetic operations to raise on overflow.

I'm totally with you

projectgus · 2025-03-26T02:40:25Z

This PR now includes arithmetic overflow checks for 64-bit bigints!

yoctopuce · 2025-03-26T05:40:28Z

This PR now includes arithmetic overflow checks for 64-bit bigints!

Excellent, thank you !

dpgeorge · 2025-05-02T03:22:53Z

py/objint_longlong.c

@@ -25,6 +25,7 @@
 * THE SOFTWARE.
 */

+#include <errno.h>


Yeah, it's a bit of a nuisance to need to include this header... it sometimes doesn't exist on embedded systems.

At the very least I'd suggest moving this include to within the MICROPY_LONGINT_IMPL == MICROPY_LONGINT_IMPL_LONGLONG guard, so it's only included if needed (because this file will be compiled on all systems regardless of the config).

In the long run, would be best to remove dependency on errno, but I'm not sure it's worth the effort given this config is likely not used much.

I've moved it down behind the preprocessor guard.

I did start writing a version that doesn't depend on strtoll, using the approach @yoctopuce suggested for mp_parse_num_integer() to handle 64-bit. It has potential to work quite nicely, but it wasn't particularly simple either so I shelved it for now.

Here is my version, in case it helps:

const byte* restrict str = (const byte*)*str_; const byte* restrict top = str + len; long long int_val = 0; for (; str < top; str++) { // get next digit as a value mp_uint_t dig = *str; if ('0' <= dig && dig <= '9') { dig -= '0'; } else if (dig == '_') { continue; } else { dig |= 0x20; // make digit lower-case if ('a' <= dig && dig <= 'z') { dig -= 'a' - 10; } else { // unknown character break; } } if (dig >= (mp_uint_t)base) { break; } // add next digit int_val = int_val * base + dig; } // negate value if needed if (neg) { int_val = -int_val; } mp_obj_t result = mp_obj_new_int_from_ll(int_val); *str_ = (char*)str;

I got inspired to go and finish the version of this I was working on, have updated this PR with the result.

dpgeorge · 2025-05-02T03:25:12Z

py/objint_longlong.c

@@ -144,9 +148,22 @@ mp_obj_t mp_obj_int_unary_op(mp_unary_op_t op, mp_obj_t o_in) {
    }
 }

+// 64-bit binary operations have overflow checking when supported by the compiler,
+// multiply is used in a few places
+#if __has_builtin(__builtin_mul_overflow)


I think this needs to be wrapped in #if defined __has_builtin. See py/misc.h and https://gcc.gnu.org/onlinedocs/cpp/_005f_005fhas_005fbuiltin.html

Or follow lib/cmsis/inc/cmsis_gcc.h:

#ifndef __has_builtin #define __has_builtin(x) (0) #endif

I also hit that __has_builtin issue when integrating this commit in my dev branch, and as __has_builtin() is sometime undefined on compilers that do have support for __builtin_*_overflow, I ended up using this code:

#if defined(_MSC_VER) #define ENABLE_INTSAFE_SIGNED_FUNCTIONS #include <intsafe.h> #define add_overflow(lhs,rhs,res) LongLongAdd(lhs, rhs, res) #define sub_overflow(lhs,rhs,res) LongLongSub(lhs, rhs, res) #define mul_overflow(lhs,rhs,res) LongLongMult(lhs,rhs,res) #elif defined(__EMSCRIPTEN__ ) || (defined(__GNUC__) && __GNUC__ >= 5) #define add_overflow(lhs,rhs,res) __builtin_add_overflow(lhs, rhs, res) #define sub_overflow(lhs,rhs,res) __builtin_sub_overflow(lhs, rhs, res) #define mul_overflow(lhs,rhs,res) __builtin_mul_overflow(lhs, rhs, res) #endif #if !defined(add_overflow) #include <limits.h> __inline bool add_overflow(long long lhs, long long rhs, long long *result) { // Check for overflow if ((rhs > 0) && (lhs > LLONG_MAX - rhs)) { return 1; // Overflow } if ((rhs < 0) && (lhs < LLONG_MIN - rhs)) { return 1; // Overflow } *result = lhs + rhs; return 0; // No overflow } __inline bool sub_overflow(long long lhs, long long rhs, long long *result) { // Check for overflow if ((rhs > 0) && (lhs < LLONG_MIN + rhs)) { return 1; // Overflow } if ((rhs < 0) && (lhs > LLONG_MAX + rhs)) { return 1; // Overflow } *result = lhs - rhs; return 0; // No overflow } __inline bool mul_overflow(long long lhs, long long rhs, long long * result) { if (lhs > 0) { if (rhs > 0) { if (lhs > (LLONG_MAX / rhs)) { return 1; // Overflow } } else { if (rhs < (LLONG_MIN / lhs)) { return 1; // Overflow } } } else { if (rhs > 0) { if (lhs < (LLONG_MIN / rhs)) { return 1; // Overflow } } else { if ((lhs != 0) && (rhs <= (LLONG_MAX / lhs))) { return 1; // Overflow } } } *result = lhs * rhs; return 0; // No overflow } #endif

My embedded platforms are all using GCC >= 5, so the bigger+slower inline code only ends up in OS-based test builds.

Impressive! @dpgeorge do you think it's worth pulling any/all of this into MP, or stick with __has_builtin and fallback to not checking overflow? (__has_builtin is GCC 10 and up, first released 2020.)

(I've added the check for defined __has_builtin already.)

If we rely only on __has_builtin, we are going to significantly increase code size for all compiler version between 2015 ( GCC 5+) and 2020 (GCC 10+).

I think the impact is not that bad. If I build the Zephyr nucleo_wb55rg board with this branch (and gcc 12) I get 242792 bytes flash size.

If I manually patch the three #if __has_builtin(*overflow) lines to #if 0 then I get 242800 bytes flash size.

Admittedly, there is now no overflow check for addition and subtraction and it's possible GCC 12 has peephole optimised the manual mp_mul_ll_overflow function to an overflow check in a way that older GCC can't detect.

It is a bit odd that GCC version determines whether or not an overflow is an exception, though. I'll see how hard it is to patch around that further.

I've updated the implementation so the overflow checks are consistent regardless of whether __has_builtin(*overflow) is true or not. Based on your code, so thanks for that! I think my version of the manual implementations are all sound...

The Zephyr build still only goes up by 8 bytes in code size if I swap from the built-in overflow checked ops to the manual ones, though. Even on GCC 5.x the difference in size seems neglible: https://gcc.godbolt.org/z/jhhn16dW6 (One key difference may be that the builtins return a correct "infinite precision" result even in case of overflow, whereas the manual functions can fall back on UB for overflow.)

On balance I'd rather keep the checks simple and fallback to the manual implementations rather than keep adding per-compiler tweaks here. Unless there's a case where it really does balloon out the code size.

@yoctopuce I'm sorry, I missed your reply from 3 days ago - GitHub hadn't loaded it into the tab I still had open so I just saw it now.

That's interesting you see such a big difference by comparison to the things I checked against. Can put in the suggested extra check for built-ins, then.

@yoctopuce Added the explicit GCC version check. Curious how the version that's there now compares to your tests. (It's still possible I'm doing something differently or incorrectly in a way that's impacting the code size difference.)

Argh, clearly a case of "more haste less speed" today.

The Zephyr port code size doesn't change by much because it isn't using MICROPY_LONGINT_IMPL_LONG_LONG. I don't know why I thought it did (or why a no-op source change causes an 8 byte difference in code size).

Updated the PR again to rely on the behaviour that an undefined macro has value 0, so we don't need to recursively expand a defined clause.

py/objint_longlong.c

tools/ci.sh

These will run on all ports which support them, but importantly they'll also run on ports that don't support arbitrary precision but do support 64-bit long ints. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>

Signed-off-by: Angus Gratton <angus@redyak.com.au>

These tests cover the use of mp_obj_new_int_from_str_len when mp_parse_num_integer overflows the SMALLINT limit. Signed-off-by: Yoctopuce dev <dev@yoctopuce.com>

Leans on the test case added in the parent commit, as json.loads() seems to be a trivial way to cause this. The test case is moved to an '_int_64' variant so the json_loads.py test can still run on builds without large int support. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>

Signed-off-by: Angus Gratton <angus@redyak.com.au>

This relies on errno unfortunately, but there's no other way to disambiguate MAX_LONGLONG and an out of range value (apart from manually re-implementing strtoll). Includes some test workarounds to account for things which now overflow: - uctypes_array_load_store test was failing already, now won't parse. - all the ffi_int tests contain 64-bit unsigned values, that won't parse as long long. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au> Signed-off-by: Angus Gratton <angus@redyak.com.au>

Relies on arbitrary precision math, so won't run on a port which has threads & limited bigint support. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au> Signed-off-by: Angus Gratton <angus@redyak.com.au>

The other performance tests run and pass with only 64-bit big integer support. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au> Signed-off-by: Angus Gratton <angus@redyak.com.au>

Enabled when supported via compiler builtins (i.e. gcc/clang). Also adds an error when shifting by a negative value. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>

Will be used in a follow-up commit. Signed-off-by: Angus Gratton <angus@redyak.com.au>

Makes it compatible with the __builtin_mul_overflow() syntax, used in follow-up commit. Signed-off-by: Angus Gratton <angus@redyak.com.au>

If long integer support is 'long long' then mp_parse_num_integer() can parse to it directly instead of failing over from small int. One tricky case this correctly overflows on is things like int("9223372036854775808") which is one more than LLONG_MAX in decimal. No unit test case added for this as it's too hard to detect 64-bit long integer mode. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>

projectgus added the py-core Relates to py/ directory in source label Mar 18, 2025

projectgus commented Mar 18, 2025

View reviewed changes

py/objint_longlong.c Outdated Show resolved Hide resolved

projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch from ec883a8 to c6f3856 Compare March 18, 2025 06:59

projectgus commented Mar 18, 2025

View reviewed changes

projectgus mentioned this pull request Mar 18, 2025

tests/extmod/json_loads: Add test cases for LONGINT parse. #16931

Closed

projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch 2 times, most recently from 0dd8885 to 5d8e0c1 Compare March 18, 2025 23:36

projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch 2 times, most recently from f8acf08 to c862fb1 Compare March 19, 2025 00:22

projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch from c862fb1 to aa0917b Compare March 19, 2025 07:14

projectgus added this to the release-1.26.0 milestone Mar 26, 2025

dpgeorge reviewed May 2, 2025

View reviewed changes

py/objint_longlong.c Outdated Show resolved Hide resolved

dpgeorge reviewed May 2, 2025

View reviewed changes

tools/ci.sh Outdated Show resolved Hide resolved

projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch 4 times, most recently from 02112a4 to 8555e92 Compare May 2, 2025 06:47

projectgus added 8 commits May 8, 2025 10:23

py/objint_longlong: Support parsing negative 64-bit integers.

1a2b365

Signed-off-by: Angus Gratton <angus@redyak.com.au>

tests/extmod/json_loads: Add test cases for LONGINT parse.

eb1eb48

These tests cover the use of mp_obj_new_int_from_str_len when mp_parse_num_integer overflows the SMALLINT limit. Signed-off-by: Yoctopuce dev <dev@yoctopuce.com>

unix,tools/ci: Add a 'longlong' variant to test 64-bit bigints in CI.

4a63026

Signed-off-by: Angus Gratton <angus@redyak.com.au>

py/objint_longlong: Add arithmetic overflow checks.

8bd3785

Enabled when supported via compiler builtins (i.e. gcc/clang). Also adds an error when shifting by a negative value. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>

projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch 2 times, most recently from 3b737ad to 3d14d08 Compare May 8, 2025 01:09

projectgus added 2 commits May 8, 2025 11:12

py/objint_longlong: Move overflow checked long long ops to misc.h.

8b6a858

Will be used in a follow-up commit. Signed-off-by: Angus Gratton <angus@redyak.com.au>

py/smallint: Update mp_small_int_mul_overflow() to perform the multiply.

61eca4c

Makes it compatible with the __builtin_mul_overflow() syntax, used in follow-up commit. Signed-off-by: Angus Gratton <angus@redyak.com.au>

projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch from 3d14d08 to 1962f40 Compare May 8, 2025 01:14

projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch from 1962f40 to fd40fb1 Compare May 8, 2025 06:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

py: Fixes and test coverage for 64-bit big integer representations. #16953

py: Fixes and test coverage for 64-bit big integer representations. #16953

projectgus commented Mar 18, 2025 •

edited

Loading

projectgus Mar 18, 2025

dpgeorge May 2, 2025

projectgus May 8, 2025

codecov bot commented Mar 18, 2025 •

edited

Loading

github-actions bot commented Mar 18, 2025 •

edited

Loading

projectgus commented Mar 18, 2025 •

edited

Loading

projectgus commented Mar 19, 2025 •

edited

Loading

yoctopuce commented Mar 19, 2025 •

edited

Loading

projectgus commented Mar 19, 2025

projectgus commented Mar 19, 2025

yoctopuce commented Mar 19, 2025

projectgus commented Mar 26, 2025

yoctopuce commented Mar 26, 2025

dpgeorge May 2, 2025

projectgus May 2, 2025

yoctopuce May 2, 2025

projectgus May 2, 2025

dpgeorge May 2, 2025

dpgeorge May 2, 2025

yoctopuce May 2, 2025

projectgus May 2, 2025

projectgus May 2, 2025

projectgus May 8, 2025 •

edited

Loading

projectgus May 8, 2025

projectgus May 8, 2025

projectgus May 8, 2025

projectgus May 8, 2025 •

edited

Loading

		// and passes tests.

		#define MICROPY_LONGINT_IMPL (MICROPY_LONGINT_IMPL_LONGLONG)

py: Fixes and test coverage for 64-bit big integer representations. #16953

Are you sure you want to change the base?

py: Fixes and test coverage for 64-bit big integer representations. #16953

Conversation

projectgus commented Mar 18, 2025 • edited Loading

Summary

Testing

Trade-offs and Alternatives

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Mar 18, 2025 • edited Loading

Codecov Report

github-actions bot commented Mar 18, 2025 • edited Loading

projectgus commented Mar 18, 2025 • edited Loading

projectgus commented Mar 19, 2025 • edited Loading

yoctopuce commented Mar 19, 2025 • edited Loading

projectgus commented Mar 19, 2025

projectgus commented Mar 19, 2025

yoctopuce commented Mar 19, 2025

projectgus commented Mar 26, 2025

yoctopuce commented Mar 26, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

projectgus May 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

projectgus May 8, 2025 • edited Loading

Choose a reason for hiding this comment

projectgus commented Mar 18, 2025 •

edited

Loading

codecov bot commented Mar 18, 2025 •

edited

Loading

github-actions bot commented Mar 18, 2025 •

edited

Loading

projectgus commented Mar 18, 2025 •

edited

Loading

projectgus commented Mar 19, 2025 •

edited

Loading

yoctopuce commented Mar 19, 2025 •

edited

Loading

projectgus May 8, 2025 •

edited

Loading

projectgus May 8, 2025 •

edited

Loading