Skip to content

printf: accept non-UTF-8 input in FORMAT and ARGUMENT arguments #8329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

drinkcat
Copy link
Contributor

@drinkcat drinkcat commented Jul 11, 2025

Rebased #7209 yet again (I "had" to squash 2 commits together to make the rebase slightly easier..), I'll also attempt to apply the changes I suggested.

--

Rebases #6812 on #7208.

EDIT: Now also includes a commit from me to pass the printf-mb GNU test, as well as a few other odds and ends. Fixes #6804.

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/printf/printf-mb is no longer failing!

@sylvestre sylvestre force-pushed the printf-7209-update branch from 7a7f02b to f31f360 Compare July 11, 2025 05:04
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/printf/printf-mb is no longer failing!

andrewliebenow and others added 6 commits July 11, 2025 18:40
Other implementations of `printf` permit arbitrary data to be passed
to `printf`. The only restriction is that a null byte terminates
FORMAT and ARGUMENT argument strings (since they are C strings).

The current implementation only accepts FORMAT and ARGUMENT
arguments that are valid UTF-8 (this is being enforced by clap).

This commit removes the UTF-8 validation by switching to OsStr
and OsString.

This allows users to use `printf` to transmit or reformat null-safe
but not UTF-8-safe data, such as text encoded in an 8-bit text
encoding. See the `non_utf_8_input` test for an example (ISO-8859-1
text).

[drinkcat: also squashed in this commit to ease rebase]
Author: Justin Tracey <jdt.dev@unsuspicious.click>

uucore, printf: improve non-UTF-8 format arguments

This fixes handling of format arguments, in part by eliminating duplicate
implementations. Utilities with format arguments other than printf will no
longer accept things like "'a" as numbers, etc.
And fix all the callers, e.g. all the next_ functions.
In one case, we'll need an actual owned String in PartialMatch,
so it's easier to just use that. Removes a bunch of lifetime things
in the code too.
After this, we can use a common extract_value function.
@drinkcat drinkcat force-pushed the printf-7209-update branch from f31f360 to 045c269 Compare July 11, 2025 10:56
@drinkcat drinkcat marked this pull request as ready for review July 11, 2025 11:00
@drinkcat
Copy link
Contributor Author

@jtracey if you have time to review this, both the rebase and my added changes, it'd be absolutely awesome...

I ended up modifying ExtendedParserError::PartialMatch to actually own a String (instead of a &str), as we need to store that in parse_quote_start. Makes the diff blow up a bit sadly.

@sylvestre
Copy link
Contributor

It looks great to me!

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/printf/printf-mb is no longer failing!

Suggested by Gemini and I think that's not a bad idea.
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/printf/printf-mb is no longer failing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

echo, printf: UTF-8 sensitivity in arguments
4 participants