Skip to content

Conversation

sffc
Copy link
Member

@sffc sffc commented Aug 30, 2025

It requires unwrapping or unsafe code to do it outside of the tinystr crate, so I thought I would add this little helper function.

/gemini review

@sffc sffc requested review from Manishearth and Copilot August 30, 2025 00:39
@sffc sffc requested a review from zbraniecki as a code owner August 30, 2025 00:39
Copy link

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

Copilot

This comment was marked as outdated.

@sffc
Copy link
Member Author

sffc commented Aug 30, 2025

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a helpful utility function new_decimal_usize. However, I've found a critical issue in the implementation where it incorrectly handles the input 0, returning an empty string instead of "0". I've provided a detailed review comment with a suggested code replacement that fixes this bug by ensuring the conversion loop runs at least once. The suggested fix also handles the edge case of N=0 more robustly.

Comment on lines 252 to 276
pub fn new_decimal_usize(number: usize) -> Result<Self, Self> {
let mut bytes = [0; N];
let mut x = number;
let mut i = 0usize;
#[expect(clippy::indexing_slicing)] // in-range: i < N
while x != 0 && i < N {
bytes[N - i - 1] = ((x % 10) as u8) + b'0';
x /= 10;
i += 1;
}
#[expect(clippy::indexing_slicing)] // in-range: i < N
if i < N {
bytes.copy_within((N - i)..N, 0);
bytes[i..N].fill(0);
}
let s = Self {
// SAFETY: `out` only contains ASCII bytes and has same size as `self.bytes`
bytes: unsafe { AsciiByte::to_ascii_byte_array(&bytes) },
};
if x != 0 {
Err(s)
} else {
Ok(s)
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current implementation has a bug when number is 0. The while x != 0 condition causes the loop to be skipped, which results in an empty string being returned instead of the expected "0". This can lead to silent errors in logic that relies on this conversion.

To fix this, I recommend refactoring the function to handle the N=0 case explicitly and then use a loop that executes at least once (simulating a do-while loop) to correctly process 0 and all other numbers. This approach is cleaner and more robust than special-casing number == 0.

    pub fn new_decimal_usize(number: usize) -> Result<Self, Self> {
        if N == 0 {
            let s = Self {
                // SAFETY: N is 0, so this is an empty array.
                bytes: unsafe { AsciiByte::to_ascii_byte_array(&[0; 0]) },
            };
            // Any number, including 0, requires at least 1 digit, so it's an error for N=0.
            return Err(s);
        }

        let mut bytes = [0; N];
        let mut x = number;
        let mut i = 0usize;
        // This loop simulates a do-while to ensure it runs at least once for number = 0.
        // The indexing is safe because we've handled N=0 and the loop breaks when i >= N.
        #[expect(clippy::indexing_slicing)]
        loop {
            bytes[N - i - 1] = ((x % 10) as u8) + b'0';
            x /= 10;
            i += 1;
            if x == 0 || i >= N {
                break;
            }
        }

        #[expect(clippy::indexing_slicing)] // in-range: i <= N
        if i < N {
            bytes.copy_within((N - i)..N, 0);
            bytes[i..N].fill(0);
        }
        let s = Self {
            // SAFETY: `bytes` only contains ASCII digits and nulls, and has the correct size.
            bytes: unsafe { AsciiByte::to_ascii_byte_array(&bytes) },
        };
        if x != 0 {
            Err(s)
        } else {
            Ok(s)
        }
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this was an actual bug. I added a test case for number == 0 and implemented a fix, although my fix is not what you suggested.

Manishearth
Manishearth previously approved these changes Aug 30, 2025
/// assert_eq!(str_truncated, tinystr!(2, "56"));
/// assert_eq!(str_saturated, tinystr!(2, "99"));
/// ```
pub fn new_decimal_usize(number: usize) -> Result<Self, Self> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: pick a real integer type

nit 2: probably just new_decimal or new_unsigned_decimal? We most likely won't need more methods.

}
let s = Self {
// SAFETY: `out` only contains ASCII bytes and has same size as `self.bytes`
bytes: unsafe { AsciiByte::to_ascii_byte_array(&bytes) },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: this is a complex function, and the invariant is hard to track

Instead, we can write this function safely, and have it call AsciiByte::from_digit(). from_digit() has a debug-assertion invariant that it is given a digit, and it does a bitmask-and-add to produce an ASCII byte.

You can also write it fully safely using a match, and it should compile to a jump table, but it's understandable if you're not sure of that. Play around with it in godbolt, perhaps.

@sffc
Copy link
Member Author

sffc commented Aug 30, 2025

/gemini review

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a helper function new_unsigned_decimal to the TinyAsciiStr type for creating ASCII strings from decimal representations of unsigned integers. This functionality was previously only possible using unwrapping or unsafe code outside the tinystr crate.

  • Adds AsciiByte::from_decimal_digit helper function for converting single digits to ASCII bytes
  • Implements TinyAsciiStr::new_unsigned_decimal for creating decimal string representations
  • Includes comprehensive documentation and examples for the new functionality

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
utils/tinystr/src/asciibyte.rs Adds documentation comment and implements from_decimal_digit helper function
utils/tinystr/src/ascii.rs Implements new_unsigned_decimal method with full documentation and examples
Comments suppressed due to low confidence (1)

utils/tinystr/src/ascii.rs:1

  • The example shows 'saturating' behavior, but the function doesn't actually saturate - it truncates the high-order digits. The variable name 'str_saturated' is misleading since it's manually providing a fallback value rather than the function performing saturation.
// This file is part of ICU4X. For terms of use, please see the file

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

}
if i < N {
bytes.copy_within((N - i)..N, 0);
bytes[i..N].fill(AsciiByte::B0);
Copy link
Preview

Copilot AI Aug 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line fills unused bytes with null characters (B0) instead of space characters, which could result in strings with embedded nulls. This is likely incorrect for ASCII string representation where unused space should be filled with spaces or the string should be properly null-terminated.

Suggested change
bytes[i..N].fill(AsciiByte::B0);
bytes[i..N].fill(AsciiByte::SPACE);

Copilot uses AI. Check for mistakes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Copilot, that isn't how TinyAsciiStr works.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new helper function TinyAsciiStr::new_unsigned_decimal to create a TinyAsciiStr from an unsigned integer. This is a useful addition, as it encapsulates logic that would be tricky to implement correctly outside the crate. The implementation is clever and handles edge cases like 0 and numbers that don't fit into the string's capacity. The documentation and examples are clear and helpful.

I've made one suggestion in asciibyte.rs to improve code conciseness.

Also, note that the PR title mentions new_decimal_usize, but the implemented function is new_unsigned_decimal which takes a u32. This seems like a good choice, but you might want to update the PR title to reflect the final implementation.

Comment on lines +151 to +169
pub(crate) fn from_decimal_digit(digit: u8) -> AsciiByte {
// Note: This code optimizes nicely with no branches.
match digit {
0 => AsciiByte::B48,
1 => AsciiByte::B49,
2 => AsciiByte::B50,
3 => AsciiByte::B51,
4 => AsciiByte::B52,
5 => AsciiByte::B53,
6 => AsciiByte::B54,
7 => AsciiByte::B55,
8 => AsciiByte::B56,
9 => AsciiByte::B57,
_ => {
debug_assert!(false, "not a single digit: {digit}");
AsciiByte::B32 // Space
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The match statement is a bit verbose for this conversion. You can achieve the same result more concisely and idiomatically by using an array lookup.

    pub(crate) fn from_decimal_digit(digit: u8) -> AsciiByte {
        const DIGITS: [AsciiByte; 10] = [
            AsciiByte::B48, AsciiByte::B49, AsciiByte::B50, AsciiByte::B51, AsciiByte::B52,
            AsciiByte::B53, AsciiByte::B54, AsciiByte::B55, AsciiByte::B56, AsciiByte::B57,
        ];
        DIGITS.get(digit as usize).copied().unwrap_or_else(|| {
            debug_assert!(false, "not a single digit: {digit}");
            AsciiByte::B32 // Space
        })
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh, I think the match statement is more readable.

@sffc sffc merged commit 77c6fd7 into unicode-org:main Aug 30, 2025
32 checks passed
@sffc sffc deleted the new_decimal_usize branch August 30, 2025 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants