Skip to content

pygettext: Add --omit-header option #130647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
StanFromIreland opened this issue Feb 27, 2025 · 7 comments
Open

pygettext: Add --omit-header option #130647

StanFromIreland opened this issue Feb 27, 2025 · 7 comments
Labels
triaged The issue has been accepted as valid by a triager. type-feature A feature request or enhancement

Comments

@StanFromIreland
Copy link
Contributor

StanFromIreland commented Feb 27, 2025

Feature or enhancement

Proposal:

From gettext:

‘--omit-header’

Don’t write header with ‘msgid ""’ entry. Note: Using this option may lead to an error in subsequent operations if the output contains non-ASCII characters.

This is useful for testing purposes because it eliminates a source of variance for generated .gmo files. With --omit-header, two invocations of xgettext on the same files with the same options at different times are guaranteed to produce the same results.

Note that using this option will lead to an error if the resulting file would not entirely be in ASCII.

Will be useful for our tests. PR soon

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

@tomasr8
Copy link
Member

tomasr8 commented Feb 27, 2025

This flag is universally useful but to expand on this:

Will be useful for our tests. PR soon

We currently have i18n tests for argparse, optparse and getopt. The snapshots are in Lib/test/translationdata.
The current format of the snapshots is just a list of msgids, which is not great because it doesn't include for example msgid_plural or msgctxt.

With --omit-header (and --no-location) we can use .po files for the snapshots instead. With the header removed, the snapshots will not change unless the source strings change.

@encukou
Copy link
Member

encukou commented Feb 28, 2025

How important is it to match xgettext behaviour?
A new user-visible option adds some maintenance overhead. This one doesn't look useful to users.
I haven't seen the tests this is useful for, but I imagine we'd still want to test the behaviour without --omit-header.

Would suffice to patch time.strftime in the tests? Or delete the line?

@picnixz picnixz added the triaged The issue has been accepted as valid by a triager. label Feb 28, 2025
@StanFromIreland
Copy link
Contributor Author

It's not about matching behavior, there are tens of missing options.

It can be used by users (smaller mo files I guess would be a use case), but it is mostly for tests.

Almost everything is possible, but mo files are binary, removing the ~15 lines of text (in the pot, and we can't be sure what exactly their contents are) from the file after its been compiled will not be simple. Adding the option on the other hand is much simpler.

@tomasr8
Copy link
Member

tomasr8 commented Feb 28, 2025

How important is it to match xgettext behaviour?

If you're worried about adding lots of CLI options, I actually don't want to add any more options (maybe besides setting the input/output encoding but I want to do more research on that). I'm mostly concerned with making pygettext work correctly and fixing the existing options.

I proposed to add this option for a few reasons which I should've elaborated more on. First, it's useful when you often deal with .po files (part of my work is managing translations for our project) - you often want to have predictable output from the extractor and don't really care about the header. --omit-header lets you easily do that. Code search on GH also reveals quite a few instances of xgettext and pybabel being called with --omit-header inside Python scripts so it indeed seems to be a relatively common thing.

I haven't seen the tests this is useful for, but I imagine we'd still want to test the behaviour without --omit-header.

Yes, in fact for pygettext tests we already patch the date in an ugly way using regex. My comment was about tests that merely use pygettext to extract strings. For example, there is a test for argparse that does this (and optparse and getopt). Currently, we use a not-so-great format where we just dump the msgids in a text file which has its own issues. Having --omit-header would allow us to do this in a nice way.

A new user-visible option adds some maintenance overhead.

I think the maintenance overhead is quite small for such a simple option, but I don't want to presume too much, you obviously have more experience in that regard 🙂

@encukou
Copy link
Member

encukou commented Mar 3, 2025

That makes sense. Thanks for elaborating!

@encukou
Copy link
Member

encukou commented Apr 30, 2025

Discussing this a bit more:

Note that using this option will lead to an error if the resulting file would not entirely be in ASCII.

This limitation would make the option unsuitable for our tests.
It's probably better to do something else than what gettext does.

Would it make sense to support SOURCE_DATE_EPOCH instead? It's a pretty standard way to "freeze" the date for all kinds of build tools.
There was opposition to it in gettext, but the arguments I could find seem outdated.

@AA-Turner
Copy link
Member

Seconding SOURCE_DATE_EPOCH, it seems to make the most sense to me.

A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged The issue has been accepted as valid by a triager. type-feature A feature request or enhancement
Projects
Status: No status
Development

No branches or pull requests

5 participants