Skip to content

gh-130197: pygettext: Test the --escape option #131902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 2, 2025

Conversation

tomasr8
Copy link
Member

@tomasr8 tomasr8 commented Mar 30, 2025

Adds tests for the -E/--escape option in pygettext. This option controls how non-ascii strings are written to the POT file.

If the option is not set, characters are written as is. If the option is set, non-ascii characters are encoded using octal escape sequences (unicode-escape). For example, α becomes \316\261.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't tests test_not_escaped and test_escaped redundant if we can use escapes.pot? We can just add _() with a long string that contain interesting characters in range 1-255.

Test for codes 127-255 look incorrect. They should produce a pair of escapes in utf-8 encoding.

Needed tests for Python sources with non-UTF-8 encoding.

@tomasr8
Copy link
Member Author

tomasr8 commented Mar 31, 2025

Aren't tests test_not_escaped and test_escaped redundant if we can use escapes.pot? We can just add _() with a long string that contain interesting characters in range 1-255.

I thought about it, but I was worried it'd be hard to review such long strings. I pushed a new change that does that just to see how it would look. Let me know what you think!

I use the same source escapes.py to generate escapes.pot which uses the --escape option, and ascii-escapes.pot which does not use the --escape option. This lets us see the differences between the two escape methods.

Needed tests for Python sources with non-UTF-8 encoding.

Do you mean that just in context of escaping or you want to have some tests in general that use non-utf8 sources?

@serhiy-storchaka
Copy link
Member

Do you mean that just in context of escaping or you want to have some tests in general that use non-utf8 sources?

Both. We need this to test with and without --escape option. In particularly, to see what encoding is used in the POT file and in the header. In file with non-UTF-8 encoding use also characters not encodable with the source encoding (\uXXXX).

@tomasr8
Copy link
Member Author

tomasr8 commented Apr 1, 2025

Do you mean that just in context of escaping or you want to have some tests in general that use non-utf8 sources?

Both. We need this to test with and without --escape option. In particularly, to see what encoding is used in the POT file and in the header. In file with non-UTF-8 encoding use also characters not encodable with the source encoding (\uXXXX).

I'll gladly add it, though there is currently no way to specify the input encoding in pygettext - it uses the file object encoding which is locale-dependent. In tests we use -Xutf8 to force utf-8 on every platform for reproducibility.

Note that xgettext has the --from-code option which sets the input encoding. Perhaps we could add this option to pygettext and use it for the non-utf8 tests?

@serhiy-storchaka
Copy link
Member

The tokenize module can be used to detect the source encoding. If it is not used now, this is a separate issue.

@tomasr8
Copy link
Member Author

tomasr8 commented Apr 1, 2025

The tokenize module can be used to detect the source encoding. If it is not used now, this is a separate issue.

That'd be even better! And no, we don't do that currently, we do f = open(...); f.encoding. Another thing to add to my todo list together with the non-printable characters 😆

Other than that I think I addressed your review, is there anything else I should add now?

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@serhiy-storchaka serhiy-storchaka merged commit 87d9983 into python:main Apr 2, 2025
42 checks passed
@serhiy-storchaka serhiy-storchaka added needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes labels Apr 2, 2025
@miss-islington-app
Copy link

Thanks @tomasr8 for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

@miss-islington-app
Copy link

Thanks @tomasr8 for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @tomasr8 and @serhiy-storchaka, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 87d9983994e9a423e9e0050b1bbee52ebaf84367 3.12

@miss-islington-app
Copy link

Sorry, @tomasr8 and @serhiy-storchaka, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 87d9983994e9a423e9e0050b1bbee52ebaf84367 3.13

@tomasr8
Copy link
Member Author

tomasr8 commented Apr 2, 2025

Thanks Serhiy!

It might be easier to not backport this as some of the test infrastructure this PR relies on was not backported. I'll double-check later when I have time.

@tomasr8 tomasr8 deleted the pygettext-cli-escape branch April 2, 2025 10:09
tomasr8 added a commit to tomasr8/cpython that referenced this pull request Apr 2, 2025
…-131902)

(cherry picked from commit 87d9983)

Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
tomasr8 added a commit to tomasr8/cpython that referenced this pull request Apr 2, 2025
…-131902)

(cherry picked from commit 87d9983)

Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
tomasr8 added a commit to tomasr8/cpython that referenced this pull request Apr 2, 2025
…-131902)

(cherry picked from commit 87d9983)

Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
tomasr8 added a commit to tomasr8/cpython that referenced this pull request Apr 2, 2025
…-131902)

(cherry picked from commit 87d9983)

Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Apr 2, 2025

GH-132032 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Apr 2, 2025
tomasr8 added a commit to tomasr8/cpython that referenced this pull request Apr 2, 2025
…-131902)

(cherry picked from commit 87d9983)

Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
tomasr8 added a commit to tomasr8/cpython that referenced this pull request Apr 2, 2025
…-131902)

(cherry picked from commit 87d9983)

Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Apr 2, 2025

GH-132033 is a backport of this pull request to the 3.12 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants