Skip to content

tr: operate on bytes instead of chars #5640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 19, 2023

Conversation

tertsdiepraam
Copy link
Member

Closes #5627

Makes tr operate on bytes only. GNU's behaviour is documented as:

The interpretation of string1 and string2 depends on locale. GNU tr fully supports only safe single-byte locales, where each possible input byte represents a single character. Unfortunately, this means GNU tr will not handle commands like ‘tr ö Ł’ the way you might expect, since (assuming a UTF-8 encoding) this is equivalent to ‘tr '\303\266' '\305\201'’ and GNU tr will simply transliterate all ‘\303’ bytes to ‘\305’ bytes, etc. POSIX does not clearly specify the behavior of tr in locales where characters are represented by byte sequences instead of by individual bytes, or where data might contain invalid bytes that are encoding errors.

We don't use locale yet, but only operate on bytes with this change. This means that unicode characters are no longer supported.

@sylvestre sylvestre merged commit 9920f13 into uutils:main Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tr: no output for some inputs
2 participants