-
Notifications
You must be signed in to change notification settings - Fork 220
Add TinyAsciiStr::new_decimal_usize #6912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. |
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a helpful utility function new_decimal_usize
. However, I've found a critical issue in the implementation where it incorrectly handles the input 0
, returning an empty string instead of "0"
. I've provided a detailed review comment with a suggested code replacement that fixes this bug by ensuring the conversion loop runs at least once. The suggested fix also handles the edge case of N=0
more robustly.
utils/tinystr/src/ascii.rs
Outdated
pub fn new_decimal_usize(number: usize) -> Result<Self, Self> { | ||
let mut bytes = [0; N]; | ||
let mut x = number; | ||
let mut i = 0usize; | ||
#[expect(clippy::indexing_slicing)] // in-range: i < N | ||
while x != 0 && i < N { | ||
bytes[N - i - 1] = ((x % 10) as u8) + b'0'; | ||
x /= 10; | ||
i += 1; | ||
} | ||
#[expect(clippy::indexing_slicing)] // in-range: i < N | ||
if i < N { | ||
bytes.copy_within((N - i)..N, 0); | ||
bytes[i..N].fill(0); | ||
} | ||
let s = Self { | ||
// SAFETY: `out` only contains ASCII bytes and has same size as `self.bytes` | ||
bytes: unsafe { AsciiByte::to_ascii_byte_array(&bytes) }, | ||
}; | ||
if x != 0 { | ||
Err(s) | ||
} else { | ||
Ok(s) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation has a bug when number
is 0
. The while x != 0
condition causes the loop to be skipped, which results in an empty string being returned instead of the expected "0"
. This can lead to silent errors in logic that relies on this conversion.
To fix this, I recommend refactoring the function to handle the N=0
case explicitly and then use a loop
that executes at least once (simulating a do-while loop) to correctly process 0
and all other numbers. This approach is cleaner and more robust than special-casing number == 0
.
pub fn new_decimal_usize(number: usize) -> Result<Self, Self> {
if N == 0 {
let s = Self {
// SAFETY: N is 0, so this is an empty array.
bytes: unsafe { AsciiByte::to_ascii_byte_array(&[0; 0]) },
};
// Any number, including 0, requires at least 1 digit, so it's an error for N=0.
return Err(s);
}
let mut bytes = [0; N];
let mut x = number;
let mut i = 0usize;
// This loop simulates a do-while to ensure it runs at least once for number = 0.
// The indexing is safe because we've handled N=0 and the loop breaks when i >= N.
#[expect(clippy::indexing_slicing)]
loop {
bytes[N - i - 1] = ((x % 10) as u8) + b'0';
x /= 10;
i += 1;
if x == 0 || i >= N {
break;
}
}
#[expect(clippy::indexing_slicing)] // in-range: i <= N
if i < N {
bytes.copy_within((N - i)..N, 0);
bytes[i..N].fill(0);
}
let s = Self {
// SAFETY: `bytes` only contains ASCII digits and nulls, and has the correct size.
bytes: unsafe { AsciiByte::to_ascii_byte_array(&bytes) },
};
if x != 0 {
Err(s)
} else {
Ok(s)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this was an actual bug. I added a test case for number == 0 and implemented a fix, although my fix is not what you suggested.
utils/tinystr/src/ascii.rs
Outdated
/// assert_eq!(str_truncated, tinystr!(2, "56")); | ||
/// assert_eq!(str_saturated, tinystr!(2, "99")); | ||
/// ``` | ||
pub fn new_decimal_usize(number: usize) -> Result<Self, Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: pick a real integer type
nit 2: probably just new_decimal or new_unsigned_decimal? We most likely won't need more methods.
utils/tinystr/src/ascii.rs
Outdated
} | ||
let s = Self { | ||
// SAFETY: `out` only contains ASCII bytes and has same size as `self.bytes` | ||
bytes: unsafe { AsciiByte::to_ascii_byte_array(&bytes) }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: this is a complex function, and the invariant is hard to track
Instead, we can write this function safely, and have it call AsciiByte::from_digit()
. from_digit()
has a debug-assertion invariant that it is given a digit, and it does a bitmask-and-add to produce an ASCII byte.
You can also write it fully safely using a match
, and it should compile to a jump table, but it's understandable if you're not sure of that. Play around with it in godbolt, perhaps.
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a helper function new_unsigned_decimal
to the TinyAsciiStr
type for creating ASCII strings from decimal representations of unsigned integers. This functionality was previously only possible using unwrapping or unsafe code outside the tinystr crate.
- Adds
AsciiByte::from_decimal_digit
helper function for converting single digits to ASCII bytes - Implements
TinyAsciiStr::new_unsigned_decimal
for creating decimal string representations - Includes comprehensive documentation and examples for the new functionality
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
utils/tinystr/src/asciibyte.rs | Adds documentation comment and implements from_decimal_digit helper function |
utils/tinystr/src/ascii.rs | Implements new_unsigned_decimal method with full documentation and examples |
Comments suppressed due to low confidence (1)
utils/tinystr/src/ascii.rs:1
- The example shows 'saturating' behavior, but the function doesn't actually saturate - it truncates the high-order digits. The variable name 'str_saturated' is misleading since it's manually providing a fallback value rather than the function performing saturation.
// This file is part of ICU4X. For terms of use, please see the file
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
} | ||
if i < N { | ||
bytes.copy_within((N - i)..N, 0); | ||
bytes[i..N].fill(AsciiByte::B0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line fills unused bytes with null characters (B0) instead of space characters, which could result in strings with embedded nulls. This is likely incorrect for ASCII string representation where unused space should be filled with spaces or the string should be properly null-terminated.
bytes[i..N].fill(AsciiByte::B0); | |
bytes[i..N].fill(AsciiByte::SPACE); |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No Copilot, that isn't how TinyAsciiStr works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new helper function TinyAsciiStr::new_unsigned_decimal
to create a TinyAsciiStr
from an unsigned integer. This is a useful addition, as it encapsulates logic that would be tricky to implement correctly outside the crate. The implementation is clever and handles edge cases like 0
and numbers that don't fit into the string's capacity. The documentation and examples are clear and helpful.
I've made one suggestion in asciibyte.rs
to improve code conciseness.
Also, note that the PR title mentions new_decimal_usize
, but the implemented function is new_unsigned_decimal
which takes a u32
. This seems like a good choice, but you might want to update the PR title to reflect the final implementation.
pub(crate) fn from_decimal_digit(digit: u8) -> AsciiByte { | ||
// Note: This code optimizes nicely with no branches. | ||
match digit { | ||
0 => AsciiByte::B48, | ||
1 => AsciiByte::B49, | ||
2 => AsciiByte::B50, | ||
3 => AsciiByte::B51, | ||
4 => AsciiByte::B52, | ||
5 => AsciiByte::B53, | ||
6 => AsciiByte::B54, | ||
7 => AsciiByte::B55, | ||
8 => AsciiByte::B56, | ||
9 => AsciiByte::B57, | ||
_ => { | ||
debug_assert!(false, "not a single digit: {digit}"); | ||
AsciiByte::B32 // Space | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The match
statement is a bit verbose for this conversion. You can achieve the same result more concisely and idiomatically by using an array lookup.
pub(crate) fn from_decimal_digit(digit: u8) -> AsciiByte {
const DIGITS: [AsciiByte; 10] = [
AsciiByte::B48, AsciiByte::B49, AsciiByte::B50, AsciiByte::B51, AsciiByte::B52,
AsciiByte::B53, AsciiByte::B54, AsciiByte::B55, AsciiByte::B56, AsciiByte::B57,
];
DIGITS.get(digit as usize).copied().unwrap_or_else(|| {
debug_assert!(false, "not a single digit: {digit}");
AsciiByte::B32 // Space
})
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meh, I think the match statement is more readable.
It requires unwrapping or unsafe code to do it outside of the tinystr crate, so I thought I would add this little helper function.
/gemini review