gh-130197: pygettext: Fix and test the `--exclude-file` option #131381

tomasr8 · 2025-03-17T21:15:09Z

Fixes and adds tests for the --exclude-file option. This is what the option does:

--exclude-file=filename
Specify a file that contains a list of strings that are not be
extracted from the input files. Each string to be excluded must
appear on a line by itself in the file.

Previously, the file was read as f.readlines() which keeps the \n character for each excluded string. This made the option not work properly. I changed this to f.read.splitlines() which strips the newlines so the option should now work as expected.

Issue: pygettext: Improve test coverage #130197

serhiy-storchaka · 2025-03-18T10:39:41Z

So, --exclude-file does not work and never worked as intended.

Note also that if it worked as intended, it would work differently than in xgettext. So why not implement the xgettext behavior?

tomasr8 · 2025-03-18T15:32:01Z

So, --exclude-file does not work and never worked as intended.

Pretty much, yes. I randomly discovered this when writing the tests for it..

Note also that if it worked as intended, it would work differently than in xgettext. So why not implement the xgettext behavior?

If you're ok with that, I can implement it. The xgettext --exclude-file option reads the msgids from a PO file. Pygettext currently only knows how to write a PO file but not how to read it. msgfmt can read PO files so we could reuse the code from there. Perhaps create some helper file for PO reading/writing?

serhiy-storchaka · 2025-03-18T18:24:22Z

For now, I think that we can just copy the code from msgfmt.py. In future we may add the code for reading the PO files in the gettext module.

tomasr8 · 2025-03-18T18:41:31Z

Ok, I'll just copy it over then for now :)

tomasr8 · 2025-03-20T22:23:22Z

In future we may add the code for reading the PO files in the gettext module.

That would actually be really helpful for my work as well. I'd be interested in contributing this. Have you already thought about how you'd like it to work? Should it be something compatible with GNUTranslations or do we want a richer interface that allows you to access e.g. the comments and flags as well?

tomasr8 · 2025-03-26T20:23:21Z

For now, I think that we can just copy the code from msgfmt.py

Just to give an update, it probably won't be as simple. I started off by copying the code with minimal changes (just to get it to work with pygettext) and I discovered several bugs in the parsing logic when I started writing tests. I'll take some inspiration from the original code but I will need to make quite a few changes in order to make PO parsing work correctly.

Allows passing a regular PO file as an exclude file. All msgids in this file will be excluded when extracting. This adds PO parsing capability to pygettext.

tomasr8 · 2025-03-29T17:20:56Z

Tools/i18n/pygettext.py

+    #   - \n, \r, \t, \\, \", \a, \b, \f, \v
+    #   - Octal escapes: \o, \oo, \ooo
+    #   - Hex escapes: \xh, \xhh, ...


This is not standardized but I checked this with GNU msgfmt and these are the ones that are allowed.

tomasr8 · 2025-03-29T17:21:26Z

Tools/i18n/pygettext.py

+            f"The file {filename} starts with a UTF-8 BOM which is not "
+            "allowed in .po files.\nPlease save the file without a BOM "
+            "and try again.")


This is the same error we added to msgfmt.py recently.

tomasr8 · 2025-03-29T17:28:15Z

I implement PO parsing in order to make --exclude-file work as it does in xgettext. I started with the code in msgfmt.py but as I wrote before, there are some issues with it (e.g. bugs, differences from GNU msgfmt) which I fixed and added lots of tests (actually most of this PR are tests now, so hopefully it won't be that daunting to review 😄 )

So to summarize:

pygettext is now able to parse PO files. I also verified that the behaviour matches GNU msgfmt (there's lots of edge cases, so I may have missed some though)
--exclude-file now accepts a PO file. All msgids from this file are ignored when extracting.

@serhiy-storchaka Do you think you could have a look at this when you have time? 🙂

serhiy-storchaka

Thank you for your work.

I am sorry, but it seems I underestimated the size and complexity of the parsing code. Worst of all, this is code that can likely be changed in future (to fix minor errors or adding features). This turned out to be not such a good idea.

The code needs to be shared in some way. Maybe just import a function from msgfmt?

Wait, how is it happened that the whole msgfmt.py is smaller than the parsing code in this PR?

serhiy-storchaka · 2025-04-09T18:33:00Z

Tools/i18n/pygettext.py

@@ -742,14 +1006,18 @@ class Options:
    # initialize list of strings to exclude
    if options.excludefilename:
        try:
-            with open(options.excludefilename) as fp:
-                options.toexclude = fp.readlines()
+            options.toexclude = get_msgids_from_exclude_file(


It seems that in xgettext you can specify --exclude-file multiple times, and all msgids are added.

serhiy-storchaka · 2025-04-09T19:23:17Z

I think that it is better to fix the parser in msgfmt first.

tomasr8 · 2025-04-10T07:38:30Z

I think that it is better to fix the parser in msgfmt first.

Originally, I was meaning to port this change to msgfmt after this is merged, but I can do the opposite. I'll send a PR to fix the msgfmt parser hopefully in a few days!

tomasr8 added 2 commits March 17, 2025 21:57

Test pygettext --exclude-file option

b1080aa

Add news entry

36370a3

bedevere-app bot added the awaiting review label Mar 17, 2025

bedevere-app bot mentioned this pull request Mar 17, 2025

pygettext: Improve test coverage #130197

Open

18 tasks

Fix news entry

6969bc3

tomasr8 requested a review from serhiy-storchaka March 18, 2025 10:22

tomasr8 added 2 commits March 29, 2025 18:06

Align --exclude-file behaviour with xgettext

0425936

Allows passing a regular PO file as an exclude file. All msgids in this file will be excluded when extracting. This adds PO parsing capability to pygettext.

Lint fix

38a4427

tomasr8 commented Mar 29, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main' into pygettext-cli-exclude

91d9f67

serhiy-storchaka reviewed Apr 9, 2025

View reviewed changes

tomasr8 marked this pull request as draft April 12, 2025 09:52

bedevere-app bot removed the awaiting review label Apr 12, 2025

tomasr8 mentioned this pull request Apr 27, 2025

gh-130197: Improve test coverage of msgfmt.py #133048

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-130197: pygettext: Fix and test the `--exclude-file` option #131381

gh-130197: pygettext: Fix and test the `--exclude-file` option #131381

Uh oh!

tomasr8 commented Mar 17, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

serhiy-storchaka commented Mar 18, 2025

Uh oh!

tomasr8 commented Mar 18, 2025

Uh oh!

serhiy-storchaka commented Mar 18, 2025

Uh oh!

tomasr8 commented Mar 18, 2025

Uh oh!

tomasr8 commented Mar 20, 2025

Uh oh!

tomasr8 commented Mar 26, 2025

Uh oh!

tomasr8 Mar 29, 2025

Uh oh!

tomasr8 Mar 29, 2025

Uh oh!

tomasr8 commented Mar 29, 2025

Uh oh!

serhiy-storchaka left a comment

Uh oh!

serhiy-storchaka Apr 9, 2025

Uh oh!

serhiy-storchaka commented Apr 9, 2025

Uh oh!

tomasr8 commented Apr 10, 2025

Uh oh!

Uh oh!

Uh oh!

gh-130197: pygettext: Fix and test the --exclude-file option #131381

Are you sure you want to change the base?

gh-130197: pygettext: Fix and test the --exclude-file option #131381

Uh oh!

Conversation

tomasr8 commented Mar 17, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serhiy-storchaka commented Mar 18, 2025

Uh oh!

tomasr8 commented Mar 18, 2025

Uh oh!

serhiy-storchaka commented Mar 18, 2025

Uh oh!

tomasr8 commented Mar 18, 2025

Uh oh!

tomasr8 commented Mar 20, 2025

Uh oh!

tomasr8 commented Mar 26, 2025

Uh oh!

tomasr8 Mar 29, 2025

Choose a reason for hiding this comment

Uh oh!

tomasr8 Mar 29, 2025

Choose a reason for hiding this comment

Uh oh!

tomasr8 commented Mar 29, 2025

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka commented Apr 9, 2025

Uh oh!

tomasr8 commented Apr 10, 2025

Uh oh!

Uh oh!

gh-130197: pygettext: Fix and test the `--exclude-file` option #131381

gh-130197: pygettext: Fix and test the `--exclude-file` option #131381

tomasr8 commented Mar 17, 2025 •

edited by bedevere-app bot

Loading