-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-130197: pygettext: Fix and test the --exclude-file
option
#131381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
So, Note also that if it worked as intended, it would work differently than in |
Pretty much, yes. I randomly discovered this when writing the tests for it..
If you're ok with that, I can implement it. The xgettext |
For now, I think that we can just copy the code from |
Ok, I'll just copy it over then for now :) |
That would actually be really helpful for my work as well. I'd be interested in contributing this. Have you already thought about how you'd like it to work? Should it be something compatible with |
Just to give an update, it probably won't be as simple. I started off by copying the code with minimal changes (just to get it to work with pygettext) and I discovered several bugs in the parsing logic when I started writing tests. I'll take some inspiration from the original code but I will need to make quite a few changes in order to make PO parsing work correctly. |
Allows passing a regular PO file as an exclude file. All msgids in this file will be excluded when extracting. This adds PO parsing capability to pygettext.
# - \n, \r, \t, \\, \", \a, \b, \f, \v | ||
# - Octal escapes: \o, \oo, \ooo | ||
# - Hex escapes: \xh, \xhh, ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not standardized but I checked this with GNU msgfmt
and these are the ones that are allowed.
f"The file {filename} starts with a UTF-8 BOM which is not " | ||
"allowed in .po files.\nPlease save the file without a BOM " | ||
"and try again.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same error we added to msgfmt.py
recently.
I implement PO parsing in order to make So to summarize:
@serhiy-storchaka Do you think you could have a look at this when you have time? 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your work.
I am sorry, but it seems I underestimated the size and complexity of the parsing code. Worst of all, this is code that can likely be changed in future (to fix minor errors or adding features). This turned out to be not such a good idea.
The code needs to be shared in some way. Maybe just import a function from msgfmt
?
Wait, how is it happened that the whole msgfmt.py
is smaller than the parsing code in this PR?
@@ -742,14 +1006,18 @@ class Options: | |||
# initialize list of strings to exclude | |||
if options.excludefilename: | |||
try: | |||
with open(options.excludefilename) as fp: | |||
options.toexclude = fp.readlines() | |||
options.toexclude = get_msgids_from_exclude_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that in xgettext
you can specify --exclude-file
multiple times, and all msgids are added.
I think that it is better to fix the parser in |
Originally, I was meaning to port this change to msgfmt after this is merged, but I can do the opposite. I'll send a PR to fix the msgfmt parser hopefully in a few days! |
Fixes and adds tests for the
--exclude-file
option. This is what the option does:Previously, the file was read as
f.readlines()
which keeps the\n
character for each excluded string. This made the option not work properly. I changed this tof.read.splitlines()
which strips the newlines so the option should now work as expected.