-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
printf: accept non-UTF-8 input in FORMAT and ARGUMENT arguments #7209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This is just a rebase, it compiles and passes our tests, but there are still some kinks to work out to get it to work as expected/pass GNU tests. |
GNU testsuite comparison:
|
needs to be rebased again :/ Sorry |
ca73a2d
to
51649ec
Compare
GNU testsuite comparison:
|
51649ec
to
6d9ab8f
Compare
GNU testsuite comparison:
|
6d9ab8f
to
2ec2433
Compare
GNU testsuite comparison:
|
2ec2433
to
016df8a
Compare
GNU testsuite comparison:
|
Other implementations of `printf` permit arbitrary data to be passed to `printf`. The only restriction is that a null byte terminates FORMAT and ARGUMENT argument strings (since they are C strings). The current implementation only accepts FORMAT and ARGUMENT arguments that are valid UTF-8 (this is being enforced by clap). This commit removes the UTF-8 validation by switching to OsStr and OsString. This allows users to use `printf` to transmit or reformat null-safe but not UTF-8-safe data, such as text encoded in an 8-bit text encoding. See the `non_utf_8_input` test for an example (ISO-8859-1 text).
016df8a
to
0760508
Compare
To spell out a bit what my commit does: there was some redundant handling of various pieces of parsing format arguments, each with their own bugs. I minimized and simplified that, and got most things to only being implemented once in more proper locations, so that, e.g., behavior no longer differs between |
GNU testsuite comparison:
|
Rebases #6812 on #7208.
EDIT: Now also includes a commit from me to pass the printf-mb GNU test, as well as a few other odds and ends. Fixes #6804.