Add TinyAsciiStr::new_decimal_usize #6912

sffc · 2025-08-30T00:39:01Z

It requires unwrapping or unsafe code to do it outside of the tinystr crate, so I thought I would add this little helper function.

/gemini review

gemini-code-assist · 2025-08-30T00:39:04Z

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

sffc · 2025-08-30T00:41:53Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a helpful utility function new_decimal_usize. However, I've found a critical issue in the implementation where it incorrectly handles the input 0, returning an empty string instead of "0". I've provided a detailed review comment with a suggested code replacement that fixes this bug by ensuring the conversion loop runs at least once. The suggested fix also handles the edge case of N=0 more robustly.

gemini-code-assist · 2025-08-30T00:43:46Z

utils/tinystr/src/ascii.rs

+    pub fn new_decimal_usize(number: usize) -> Result<Self, Self> {
+        let mut bytes = [0; N];
+        let mut x = number;
+        let mut i = 0usize;
+        #[expect(clippy::indexing_slicing)] // in-range: i < N
+        while x != 0 && i < N {
+            bytes[N - i - 1] = ((x % 10) as u8) + b'0';
+            x /= 10;
+            i += 1;
+        }
+        #[expect(clippy::indexing_slicing)] // in-range: i < N
+        if i < N {
+            bytes.copy_within((N - i)..N, 0);
+            bytes[i..N].fill(0);
+        }
+        let s = Self {
+            // SAFETY: `out` only contains ASCII bytes and has same size as `self.bytes`
+            bytes: unsafe { AsciiByte::to_ascii_byte_array(&bytes) },
+        };
+        if x != 0 {
+            Err(s)
+        } else {
+            Ok(s)
+        }
+    }


The current implementation has a bug when number is 0. The while x != 0 condition causes the loop to be skipped, which results in an empty string being returned instead of the expected "0". This can lead to silent errors in logic that relies on this conversion.

To fix this, I recommend refactoring the function to handle the N=0 case explicitly and then use a loop that executes at least once (simulating a do-while loop) to correctly process 0 and all other numbers. This approach is cleaner and more robust than special-casing number == 0.

pub fn new_decimal_usize(number: usize) -> Result<Self, Self> { if N == 0 { let s = Self { // SAFETY: N is 0, so this is an empty array. bytes: unsafe { AsciiByte::to_ascii_byte_array(&[0; 0]) }, }; // Any number, including 0, requires at least 1 digit, so it's an error for N=0. return Err(s); } let mut bytes = [0; N]; let mut x = number; let mut i = 0usize; // This loop simulates a do-while to ensure it runs at least once for number = 0. // The indexing is safe because we've handled N=0 and the loop breaks when i >= N. #[expect(clippy::indexing_slicing)] loop { bytes[N - i - 1] = ((x % 10) as u8) + b'0'; x /= 10; i += 1; if x == 0 || i >= N { break; } } #[expect(clippy::indexing_slicing)] // in-range: i <= N if i < N { bytes.copy_within((N - i)..N, 0); bytes[i..N].fill(0); } let s = Self { // SAFETY: `bytes` only contains ASCII digits and nulls, and has the correct size. bytes: unsafe { AsciiByte::to_ascii_byte_array(&bytes) }, }; if x != 0 { Err(s) } else { Ok(s) } }

Thank you, this was an actual bug. I added a test case for number == 0 and implemented a fix, although my fix is not what you suggested.

Manishearth · 2025-08-30T00:56:15Z

utils/tinystr/src/ascii.rs

+    /// assert_eq!(str_truncated, tinystr!(2, "56"));
+    /// assert_eq!(str_saturated, tinystr!(2, "99"));
+    /// ```
+    pub fn new_decimal_usize(number: usize) -> Result<Self, Self> {


nit: pick a real integer type

nit 2: probably just new_decimal or new_unsigned_decimal? We most likely won't need more methods.

Manishearth · 2025-08-30T01:01:36Z

utils/tinystr/src/ascii.rs

+        }
+        let s = Self {
+            // SAFETY: `out` only contains ASCII bytes and has same size as `self.bytes`
+            bytes: unsafe { AsciiByte::to_ascii_byte_array(&bytes) },


issue: this is a complex function, and the invariant is hard to track

Instead, we can write this function safely, and have it call AsciiByte::from_digit(). from_digit() has a debug-assertion invariant that it is given a digit, and it does a bitmask-and-add to produce an ASCII byte.

You can also write it fully safely using a match, and it should compile to a jump table, but it's understandable if you're not sure of that. Play around with it in godbolt, perhaps.

sffc · 2025-08-30T01:42:08Z

/gemini review

Copilot

Pull Request Overview

This PR adds a helper function new_unsigned_decimal to the TinyAsciiStr type for creating ASCII strings from decimal representations of unsigned integers. This functionality was previously only possible using unwrapping or unsafe code outside the tinystr crate.

Adds AsciiByte::from_decimal_digit helper function for converting single digits to ASCII bytes
Implements TinyAsciiStr::new_unsigned_decimal for creating decimal string representations
Includes comprehensive documentation and examples for the new functionality

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
utils/tinystr/src/asciibyte.rs	Adds documentation comment and implements `from_decimal_digit` helper function
utils/tinystr/src/ascii.rs	Implements `new_unsigned_decimal` method with full documentation and examples

Comments suppressed due to low confidence (1)

utils/tinystr/src/ascii.rs:1

The example shows 'saturating' behavior, but the function doesn't actually saturate - it truncates the high-order digits. The variable name 'str_saturated' is misleading since it's manually providing a fallback value rather than the function performing saturation.

// This file is part of ICU4X. For terms of use, please see the file

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-08-30T01:42:27Z

utils/tinystr/src/ascii.rs

+        }
+        if i < N {
+            bytes.copy_within((N - i)..N, 0);
+            bytes[i..N].fill(AsciiByte::B0);


This line fills unused bytes with null characters (B0) instead of space characters, which could result in strings with embedded nulls. This is likely incorrect for ASCII string representation where unused space should be filled with spaces or the string should be properly null-terminated.

Suggested change

bytes[i..N].fill(AsciiByte::B0);

bytes[i..N].fill(AsciiByte::SPACE);

No Copilot, that isn't how TinyAsciiStr works.

gemini-code-assist

Code Review

This pull request adds a new helper function TinyAsciiStr::new_unsigned_decimal to create a TinyAsciiStr from an unsigned integer. This is a useful addition, as it encapsulates logic that would be tricky to implement correctly outside the crate. The implementation is clever and handles edge cases like 0 and numbers that don't fit into the string's capacity. The documentation and examples are clear and helpful.

I've made one suggestion in asciibyte.rs to improve code conciseness.

Also, note that the PR title mentions new_decimal_usize, but the implemented function is new_unsigned_decimal which takes a u32. This seems like a good choice, but you might want to update the PR title to reflect the final implementation.

gemini-code-assist · 2025-08-30T01:43:48Z

utils/tinystr/src/asciibyte.rs

+    pub(crate) fn from_decimal_digit(digit: u8) -> AsciiByte {
+        // Note: This code optimizes nicely with no branches.
+        match digit {
+            0 => AsciiByte::B48,
+            1 => AsciiByte::B49,
+            2 => AsciiByte::B50,
+            3 => AsciiByte::B51,
+            4 => AsciiByte::B52,
+            5 => AsciiByte::B53,
+            6 => AsciiByte::B54,
+            7 => AsciiByte::B55,
+            8 => AsciiByte::B56,
+            9 => AsciiByte::B57,
+            _ => {
+                debug_assert!(false, "not a single digit: {digit}");
+                AsciiByte::B32 // Space
+            }
+        }
+    }


The match statement is a bit verbose for this conversion. You can achieve the same result more concisely and idiomatically by using an array lookup.

pub(crate) fn from_decimal_digit(digit: u8) -> AsciiByte { const DIGITS: [AsciiByte; 10] = [ AsciiByte::B48, AsciiByte::B49, AsciiByte::B50, AsciiByte::B51, AsciiByte::B52, AsciiByte::B53, AsciiByte::B54, AsciiByte::B55, AsciiByte::B56, AsciiByte::B57, ]; DIGITS.get(digit as usize).copied().unwrap_or_else(|| { debug_assert!(false, "not a single digit: {digit}"); AsciiByte::B32 // Space }) }

Meh, I think the match statement is more readable.

Add TinyAsciiStr::new_decimal_usize

c7754be

sffc requested review from Manishearth and Copilot August 30, 2025 00:39

sffc requested a review from zbraniecki as a code owner August 30, 2025 00:39

This comment was marked as outdated.

Sign in to view

Clarify docs

f7fd6a6

gemini-code-assist bot reviewed Aug 30, 2025

View reviewed changes

sffc added 2 commits August 29, 2025 17:52

Fix n == 0

6764fe6

clippy

ffb8f3c

Manishearth previously approved these changes Aug 30, 2025

View reviewed changes

sffc added 2 commits August 29, 2025 18:04

new_unsigned_decimal(u32)

26a5a91

No unsafe code

bdd975a

sffc dismissed Manishearth’s stale review via bdd975a August 30, 2025 01:39

sffc requested a review from Copilot August 30, 2025 01:42

Copilot AI reviewed Aug 30, 2025

View reviewed changes

gemini-code-assist bot reviewed Aug 30, 2025

View reviewed changes

Manishearth approved these changes Aug 30, 2025

View reviewed changes

sffc merged commit 77c6fd7 into unicode-org:main Aug 30, 2025
32 checks passed

sffc deleted the new_decimal_usize branch August 30, 2025 02:07

	bytes[i..N].fill(AsciiByte::B0);
	bytes[i..N].fill(AsciiByte::SPACE);

Add TinyAsciiStr::new_decimal_usize #6912

Add TinyAsciiStr::new_decimal_usize #6912

Uh oh!

Conversation

sffc commented Aug 30, 2025

Uh oh!

gemini-code-assist bot commented Aug 30, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

sffc commented Aug 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

sffc Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

Manishearth Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

Manishearth Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

sffc commented Aug 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

sffc Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

sffc Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!