Skip to content

printf: accept non-UTF-8 input in FORMAT and ARGUMENT arguments #8329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 15, 2025

Conversation

drinkcat
Copy link
Contributor

@drinkcat drinkcat commented Jul 11, 2025

Rebased #7209 yet again (I "had" to squash 2 commits together to make the rebase slightly easier..), I'll also attempt to apply the changes I suggested.

--

Rebases #6812 on #7208.

EDIT: Now also includes a commit from me to pass the printf-mb GNU test, as well as a few other odds and ends. Fixes #6804.

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/printf/printf-mb is no longer failing!

@sylvestre sylvestre force-pushed the printf-7209-update branch from 7a7f02b to f31f360 Compare July 11, 2025 05:04
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/printf/printf-mb is no longer failing!

@drinkcat drinkcat force-pushed the printf-7209-update branch from f31f360 to 045c269 Compare July 11, 2025 10:56
@drinkcat drinkcat marked this pull request as ready for review July 11, 2025 11:00
@drinkcat
Copy link
Contributor Author

@jtracey if you have time to review this, both the rebase and my added changes, it'd be absolutely awesome...

I ended up modifying ExtendedParserError::PartialMatch to actually own a String (instead of a &str), as we need to store that in parse_quote_start. Makes the diff blow up a bit sadly.

@sylvestre
Copy link
Contributor

It looks great to me!

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/printf/printf-mb is no longer failing!

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/printf/printf-mb is no longer failing!

@jtracey
Copy link
Contributor

jtracey commented Jul 15, 2025

Lgtm! Thanks for picking this up. FWIW, the first commit should probably have multiple authors, but if this gets squashed GitHub might handle it on its own. If not, it's not the end of the world if the commit history is a little messy.

@drinkcat drinkcat force-pushed the printf-7209-update branch from 4a472b8 to d2c7ec4 Compare July 15, 2025 01:32
@drinkcat
Copy link
Contributor Author

Lgtm! Thanks for picking this up. FWIW, the first commit should probably have multiple authors, but if this gets squashed GitHub might handle it on its own. If not, it's not the end of the world if the commit history is a little messy.

Oh, didn't know the logic (I think native git only allows a single author?), I did keep a "Author:" line in the first squashed commit, but github didn't pick it up. Looks like adding a Co-authored-by: tag in the trailer of that squashed commit did the trick (you now appear as an author).

Also I'll... rebase to fix the conflict.

@drinkcat drinkcat force-pushed the printf-7209-update branch from d2c7ec4 to 8437dc7 Compare July 15, 2025 01:59
@drinkcat
Copy link
Contributor Author

Rebased... Hopefully all tests still pass.

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/printf/printf-mb is no longer failing!

@sylvestre sylvestre self-requested a review July 15, 2025 06:09
@sylvestre
Copy link
Contributor

Interesting, github does not allow me to merge it

@drinkcat
Copy link
Contributor Author

Weird... Not sure if I can do anything to help... Looks like it's waiting for a review by you? But... "It is not required to merge." it says...

andrewliebenow and others added 7 commits July 16, 2025 00:28
Other implementations of `printf` permit arbitrary data to be passed
to `printf`. The only restriction is that a null byte terminates
FORMAT and ARGUMENT argument strings (since they are C strings).

The current implementation only accepts FORMAT and ARGUMENT
arguments that are valid UTF-8 (this is being enforced by clap).

This commit removes the UTF-8 validation by switching to OsStr
and OsString.

This allows users to use `printf` to transmit or reformat null-safe
but not UTF-8-safe data, such as text encoded in an 8-bit text
encoding. See the `non_utf_8_input` test for an example (ISO-8859-1
text).

[drinkcat: also squashed in this commit to ease rebase]
Author: Justin Tracey <jdt.dev@unsuspicious.click>

uucore, printf: improve non-UTF-8 format arguments

This fixes handling of format arguments, in part by eliminating duplicate
implementations. Utilities with format arguments other than printf will no
longer accept things like "'a" as numbers, etc.

Co-authored-by: Justin Tracey <jdt.dev@unsuspicious.click>
And fix all the callers, e.g. all the next_ functions.
In one case, we'll need an actual owned String in PartialMatch,
so it's easier to just use that. Removes a bunch of lifetime things
in the code too.
After this, we can use a common extract_value function.
Suggested by Gemini and I think that's not a bad idea.
@RenjiSann RenjiSann force-pushed the printf-7209-update branch from 8437dc7 to ccad817 Compare July 15, 2025 22:28
@RenjiSann RenjiSann merged commit 1bb7930 into uutils:main Jul 15, 2025
25 checks passed
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/printf/printf-mb is no longer failing!

@sylvestre
Copy link
Contributor

well done folks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

echo, printf: UTF-8 sensitivity in arguments
5 participants