Skip to content

printf: accept non-UTF-8 input in FORMAT and ARGUMENT arguments #7209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jtracey
Copy link
Contributor

@jtracey jtracey commented Jan 25, 2025

Rebases #6812 on #7208.

EDIT: Now also includes a commit from me to pass the printf-mb GNU test, as well as a few other odds and ends. Fixes #6804.

@jtracey
Copy link
Contributor Author

jtracey commented Jan 25, 2025

This is just a rebase, it compiles and passes our tests, but there are still some kinks to work out to get it to work as expected/pass GNU tests.

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/usage_vs_getopt (passes in this run but fails in the 'main' branch)

@sylvestre
Copy link
Contributor

needs to be rebased again :/ Sorry

@jtracey jtracey force-pushed the printf-allow-non-utf-8 branch 3 times, most recently from ca73a2d to 51649ec Compare February 22, 2025 01:48
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@jtracey
Copy link
Contributor Author

jtracey commented Feb 22, 2025

Rebased on #7208 again. This still needs some cleaning up IMO, but I'm going to hold off until #7208 gets merged.

@jtracey jtracey force-pushed the printf-allow-non-utf-8 branch from 51649ec to 6d9ab8f Compare April 30, 2025 19:14
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

@jtracey jtracey force-pushed the printf-allow-non-utf-8 branch from 6d9ab8f to 2ec2433 Compare April 30, 2025 20:21
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

@jtracey jtracey force-pushed the printf-allow-non-utf-8 branch from 2ec2433 to 016df8a Compare May 3, 2025 01:05
@jtracey
Copy link
Contributor Author

jtracey commented May 3, 2025

Force push before the most recent is the last with the individual commits from #6812, most recent force push squashes them into one and adds my fixes in another commit. Final rebased #6812 commits before squashing are: 9eddbca a9f53a6 bc7516a 2ec2433

Copy link

github-actions bot commented May 3, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/printf/printf-mb is no longer failing!

andrewliebenow and others added 2 commits May 2, 2025 22:10
Other implementations of `printf` permit arbitrary data to be passed
to `printf`. The only restriction is that a null byte terminates
FORMAT and ARGUMENT argument strings (since they are C strings).

The current implementation only accepts FORMAT and ARGUMENT
arguments that are valid UTF-8 (this is being enforced by clap).

This commit removes the UTF-8 validation by switching to OsStr
and OsString.

This allows users to use `printf` to transmit or reformat null-safe
but not UTF-8-safe data, such as text encoded in an 8-bit text
encoding. See the `non_utf_8_input` test for an example (ISO-8859-1
text).
@jtracey jtracey force-pushed the printf-allow-non-utf-8 branch from 016df8a to 0760508 Compare May 3, 2025 02:10
@jtracey jtracey marked this pull request as ready for review May 3, 2025 02:11
@jtracey
Copy link
Contributor Author

jtracey commented May 3, 2025

To spell out a bit what my commit does: there was some redundant handling of various pieces of parsing format arguments, each with their own bugs. I minimized and simplified that, and got most things to only being implemented once in more proper locations, so that, e.g., behavior no longer differs between %i and %d format strings (aside from the type), misc. utils like seq no longer accept things like 'a as numbers, etc.

Copy link

github-actions bot commented May 3, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/printf/printf-mb is no longer failing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

echo, printf: UTF-8 sensitivity in arguments
3 participants