Skip to content

Fix int respect sys.set_int_max_str_digits value #6094

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions Lib/test/test_int.py
Original file line number Diff line number Diff line change
Expand Up @@ -712,8 +712,7 @@ def test_denial_of_service_prevented_int_to_str(self):
self.assertIn('conversion', str(err.exception))
self.assertLess(sw_fail_extra_huge.seconds, sw_convert.seconds/2)

# TODO: RUSTPYTHON
@unittest.expectedFailure
@unittest.expectedFailureIf(support.is_wasi, "TODO: RUSTPYTHON; module 'time' has no attribute 'get_clock_info'")
def test_denial_of_service_prevented_str_to_int(self):
"""Regression test: ensure we fail before performing O(N**2) work."""
maxdigits = sys.get_int_max_str_digits()
Expand Down
7 changes: 5 additions & 2 deletions Lib/test/test_json/test_decode.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,8 +127,6 @@ def test_negative_index(self):
d = self.json.JSONDecoder()
self.assertRaises(ValueError, d.raw_decode, 'a'*42, -50000)

# TODO: RUSTPYTHON
@unittest.expectedFailure
def test_limit_int(self):
maxdigits = 5000
with support.adjust_int_max_str_digits(maxdigits):
Expand All @@ -144,3 +142,8 @@ class TestCDecode(TestDecode, CTest):
@unittest.expectedFailure
def test_keys_reuse(self):
return super().test_keys_reuse()

# TODO: RUSTPYTHON
@unittest.expectedFailure
def test_limit_int(self):
return super().test_limit_int()
11 changes: 10 additions & 1 deletion vm/src/protocol/number.rs
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,16 @@ impl PyObject {
))
})
} else if let Some(s) = self.downcast_ref::<PyStr>() {
try_convert(self, s.as_wtf8().trim().as_bytes(), vm)
let val = s.as_wtf8().trim().as_bytes();
let max_len = vm.state.int_max_str_digits.load();
if max_len > 0 {
let sign_len = usize::from(matches!(val.first(), Some(b'+' | b'-')));
let val_len = val.len();
if val_len > max_len + sign_len {
return Err(vm.new_value_error(format!("Exceeds the limit ({max_len} digits) for integer string conversion: value has {val_len} digits; use sys.set_int_max_str_digits() to increase the limit")));
}
}
try_convert(self, val, vm)
Comment on lines +79 to +88
Copy link
Contributor

@coderabbitai coderabbitai bot Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Apply the same digit-limit enforcement to bytes-like inputs (PyBytes, PyByteArray, ArgBytesLike).

Per CPython, sys.set_int_max_str_digits applies to “string or bytes-like” conversions. The current change only guards PyStr; PyBytes, PyByteArray, and other bytes-like paths bypass the limit.

Recommend extracting the check into a shared helper and invoking it for all three branches, e.g.:

fn enforce_int_max_str_digits(vm: &VirtualMachine, buf: &[u8]) -> PyResult<()> {
    let max_len = vm.state.int_max_str_digits.load();
    if max_len == 0 {
        return Ok(());
    }
    // Skip leading ASCII whitespace
    let mut i = 0usize;
    while i < buf.len() && buf[i].is_ascii_whitespace() {
        i += 1;
    }
    // Skip optional sign
    if i < buf.len() && matches!(buf[i], b'+' | b'-') {
        i += 1;
    }
    // Count ASCII digits; ignore underscores; stop at first non-digit/underscore
    let mut digits = 0usize;
    while i < buf.len() {
        match buf[i] {
            b'0'..=b'9' => {
                digits += 1;
                if digits > max_len {
                    return Err(vm.new_value_error(format!(
                        "Exceeds the limit ({max_len} digits) for integer string conversion: value has {digits} digits; use sys.set_int_max_str_digits() to increase the limit"
                    )));
                }
            }
            b'_' => {}
            _ => break,
        }
        i += 1;
    }
    Ok(())
}

Then:

  • In the PyStr branch, call enforce_int_max_str_digits(vm, val)?; before try_convert.
  • In the PyBytes branch:
    • let val = bytes; enforce_int_max_str_digits(vm, val)?; try_convert(self, val, vm)
  • In the PyByteArray branch:
    • let buf = bytearray.borrow_buf(); enforce_int_max_str_digits(vm, &buf)?; try_convert(self, &buf, vm)
  • In the ArgBytesLike branch:
    • let buf = buffer.borrow_buf(); enforce_int_max_str_digits(vm, &buf)?; try_convert(self, &buf, vm)

This aligns behavior with CPython and prevents bypassing the limit via bytes-like inputs.

🤖 Prompt for AI Agents
In vm/src/protocol/number.rs around lines 79 to 88, the current digit-limit
enforcement is only applied to PyStr inputs and lets
PyBytes/PyByteArray/ArgBytesLike bypass sys.set_int_max_str_digits; add a shared
helper (e.g., enforce_int_max_str_digits(vm, buf: &[u8]) -> PyResult<()>) that
reads vm.state.int_max_str_digits, skips leading ASCII whitespace, optionally
skips a leading '+'/'-', then counts ASCII digits while ignoring underscores and
stops at first non-digit/underscore, returning the same ValueError message when
digits exceed the limit; call this helper before try_convert in the PyStr branch
and also in the PyBytes, PyByteArray (use borrowed buffer), and ArgBytesLike
branches so bytes-like inputs are validated the same way as strings.

⚠️ Potential issue

Incorrect limit check: raw byte length counts non-digits and misreports; count actual digits (ignore sign and underscores).

The current check uses val.len() and adjusts for a leading sign. This will:

  • Over-count when underscores are present (valid numeric separators) and incorrectly raise a ValueError.
  • Misreport “digits” when the string contains non-digits (e.g., trailing junk, non-ASCII), leading to false positives.
  • Potentially report an inaccurate “value has {val_len} digits” number.

You should count only ASCII decimal digits, ignore a leading '+'/'-', ignore underscores, and stop at the first non-digit/underscore. Also, use the computed digit count in the error message.

Apply this diff within the changed block:

-            let val = s.as_wtf8().trim().as_bytes();
-            let max_len = vm.state.int_max_str_digits.load();
-            if max_len > 0 {
-                let sign_len = usize::from(matches!(val.first(), Some(b'+' | b'-')));
-                let val_len = val.len();
-                if val_len > max_len + sign_len {
-                    return Err(vm.new_value_error(format!("Exceeds the limit ({max_len} digits) for integer string conversion: value has {val_len} digits; use sys.set_int_max_str_digits() to increase the limit")));
-                }
-            }
-            try_convert(self, val, vm)
+            let val = s.as_wtf8().trim().as_bytes();
+            // Enforce sys.set_int_max_str_digits on the actual number of digits,
+            // ignoring an optional sign and valid underscores.
+            let max_len = vm.state.int_max_str_digits.load();
+            if max_len > 0 {
+                // Skip optional leading sign.
+                let mut iter = val.iter().copied();
+                if matches!(iter.clone().next(), Some(b'+' | b'-')) {
+                    iter.next();
+                }
+                // Count ASCII digits; ignore underscores; stop at first non-digit/underscore.
+                let mut digits = 0usize;
+                for b in iter {
+                    match b {
+                        b'0'..=b'9' => {
+                            digits += 1;
+                            if digits > max_len {
+                                return Err(vm.new_value_error(format!(
+                                    "Exceeds the limit ({max_len} digits) for integer string conversion: value has {digits} digits; use sys.set_int_max_str_digits() to increase the limit"
+                                )));
+                            }
+                        }
+                        b'_' => {}
+                        _ => break,
+                    }
+                }
+            }
+            try_convert(self, val, vm)
🤖 Prompt for AI Agents
In vm/src/protocol/number.rs around lines 79 to 88, the length check currently
uses the raw byte length (adjusted only for a leading sign), which over-counts
when underscores or non-digit bytes are present and misreports the number of
digits; instead, iterate bytes starting after an optional leading '+' or '-',
count only ASCII digits (ignore '_' underscores), stop counting at the first
byte that is neither a digit nor an underscore, and use that computed
digit_count for both the limit comparison and the error message (i.e., if
digit_count > max_len then return Err with a message that uses digit_count and
max_len). Ensure underscores are skipped in the count and that the sign is not
counted as a digit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShaharNaveh Thank you! I'm glad the analysis was helpful in identifying the digit counting issue. The fix should ensure proper compliance with sys.set_int_max_str_digits by accurately counting only the actual numeric digits while ignoring underscores and signs.

Let me know if you need any assistance with implementing the corrections for both the digit counting logic and extending the validation to bytes-like inputs!

🐰✨
Happy to help catch those tricky edge cases!
(\ /)
( ..)
o
(")(")


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

} else if let Some(bytes) = self.downcast_ref::<PyBytes>() {
try_convert(self, bytes, vm)
} else if let Some(bytearray) = self.downcast_ref::<PyByteArray>() {
Expand Down
Loading