gh-79516: allow msgfmt.py to compile multiple input po files #10875

s-ball · 2018-12-03T19:33:02Z

msgfmt.py (from Tools/i18n) can now reliably be passed more than one input po file.

In addition, its make central function can reliably be called repeatedly, which fixes bpo-9741

https://bugs.python.org/issue35335

Issue: msgfmt should be able to merge more than one po file #79516

Test option processing, and conversion of one single file with or without the -o option. Also test the little documented behaviour of merging two input files with -o option.

Also show that it is now possible to build multiple po files in one single script call.

the-knights-who-say-ni · 2018-12-03T19:33:06Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Our records indicate we have not received your CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

… multi_inputs

When merging main into multi_inputs, the reference to os_helper was erroneously removed.

In 2018, all imports came from the test.support package. They are now splitted among various subpackages.

s-ball · 2025-01-23T21:00:02Z

My patches now successfully pass all tests. Is there anything else I should do?

The newly added tests require a 1.2 version.

… multi_inputs

s-ball · 2025-03-18T09:16:16Z

Please forgive my previous comment. I was tired and should not have posted it.

I wanted to keep a compatibility with the previous make function that accepted a string while I wanted to be able to pass an iterable of strings. This is what I meant with

Both ways are for compatibility reasons with previous behaviour.

in the docstring. As I now understand that any internal interface of a module from the Tools folder is an internal detail, it would indeed be much simpler and clearer to always pass an iterable of strings to make. I shall soon push a new commit doing that.

serhiy-storchaka · 2025-03-18T09:16:34Z

AFAIK, the differences between the GNU msgfmt and msgfmt.py:

The GNU msgfmt always combine multiple PO source files. msgfmt.py can produce separate MO files for every PO source file.
If the output file is not specified, the GNU msgfmt writes to message.mo. msgfmt.py infers the MO file name from the PO source file name.

So, combining multiple PO source files is needed for consistency with the GNU msgfmt, and producing multiple MO files is needed for backward compatibility with msgfmt.py. The question is how to combine these two modes, and your PR provides a reasonable compromise.

msgfmt.py does not belong to the stdlib and does not officially provide other API besides the CLI. If you want to use it programmatically, it is better to copy the code.

Removing the global MESSAGES is not necessary for fixing the msgfmt.py for multiple input files and no output file. You can just reset it between processing separate files if no output file is specified. But it will not harm.

The simplest fix of the original issue is:

diff --git a/Tools/i18n/msgfmt.py b/Tools/i18n/msgfmt.py
index 878d3e29077..d9c46fcd166 100755
--- a/Tools/i18n/msgfmt.py
+++ b/Tools/i18n/msgfmt.py
@@ -250,6 +250,9 @@ def main():
         return
 
     for filename in args:
+        if outfile is None:
+            global MESSAGES
+            MESSAGES = {}
         make(filename, outfile)

The code is inefficient if the output file is specified, because it will be rewritten again and again for every input file, but it will work. The code needs rewriting for efficiency and clarity, and for better API if we decide to provide it in future.

All the logic is now in make at the cost of a break in compatibility.

s-ball · 2025-03-18T14:10:26Z

@serhiy-storchaka I am afraid I did not fully understand your last comment. Do you really want me to bring back the global MESSAGES? Aside from that I have followed your advice.

serhiy-storchaka · 2025-03-18T14:56:27Z

No, it is only demonstration that all other changes is a beautification of the code. We usually does not allow pure cosmetic changes, but we can allow them if they are related to the behavior changes and make the final code clearer.

serhiy-storchaka

The GNU msgfmt treats duplicated keys as conflicts. Silently ignoring can lead to bugs.

serhiy-storchaka · 2025-03-18T15:18:50Z

Tools/i18n/msgfmt.py

+    BEWARE: in previous version of make the first parameter was a string
+    containing a single filename.


This note will soon became confusing, as it is not clear what version is previous. The right place for it is in the commit message, not in docstring.

serhiy-storchaka · 2025-03-18T15:23:06Z

Tools/i18n/msgfmt.py

+        # each PO file generates its corresponding MO file
+        for filename in filenames:
+            messages = {}
+            infile, outfile = get_names(filename, None)


Now we can refactor get_names(). Split it on two parts -- for infile and for outfile. The first part can be inlined and moved in process(), and the second part can be left here, in make().

serhiy-storchaka · 2025-03-18T15:23:46Z

Tools/i18n/msgfmt.py

-    ID = 1
-    STR = 2
-    CTXT = 3
+def make(filenames, outfile=None):


No default value is needed.

serhiy-storchaka · 2025-03-18T15:28:35Z

Lib/test/test_tools/test_msgfmt.py

+    """Tests for multiple input files
+    """
+
+    def test_no_outputfile(self):


This test looks redundant. test_both_without_outputfile supersedes it, and this is MultiInputTest.

tomasr8 · 2025-03-18T15:37:23Z

The GNU msgfmt treats duplicated keys as conflicts. Silently ignoring can lead to bugs.

We touched upon this a bit before: #10875 (comment) We could either fix it now or leave it for a followup PR. Which one would you prefer?

serhiy-storchaka · 2025-03-18T18:20:44Z

If the solution is non-trivial, it can wait. I'm just worried that it may require rewriting the tests added here.

Tools/i18n/msgfmt.py

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>

s-ball · 2025-03-18T21:56:09Z

If the solution is non-trivial, it can wait. I'm just worried that it may require rewriting the tests added here.

It would be rather simple to test if a new msgid already exists and raise an exception. But GNU msgfmt gives more: the exact file and line of the first occurrence and of the second one. That part will require a non trivial work... For the tests part, the only change would be that the current merge should abort (the "" msgid is present in both files), and that we should provide a copy of the second PO file without its header to obtain the current merge.

Remove one redundant test from test_msgfmt.py Remove unneeded changes from msgfmt.py

… multi_inputs

serhiy-storchaka · 2025-03-19T09:12:26Z

Tools/i18n/msgfmt.py


+def get_names(filename, outfile):


It is no longer used.

serhiy-storchaka · 2025-03-19T09:20:26Z

Tools/i18n/msgfmt.py

+        # each PO file generates its corresponding MO file
+        for filename in filenames:
+            messages = {}
+            outfile = os.path.splitext(filename)[0] + '.mo'


There is a change in the behavior if filename does not end with .po.

For testing, you can use other suffix for one of test PO files.

That is the reason why outfile was computed after infile. As infile is always computed the same way, be outfile present or not, I externalized it in the get_names method when I write my initial patch in 2018. I am sorry but it was a long time ago and I had completely forgotten about that.

The only way I can imagine without code duplication is that process returns its input filename. But I would not find that very nice on a separation of concern point of view. Do you prefer that?

After a second thought, that latter way is more efficient because the output file name is not computed when it was provided. I'll try to implement it.

It is up to you. But computing outfile when it was not used was not elegant.

You can factor out computation of infile (without outfile) into a function and call it in both loops. Or you can pre-process the list of filenames.

The use of normcase() complicates the code (it was the right decision). In the end, your code with get_names() may be not the worst option. It only looked not elegant. In any case, please add more tests or modify existing tests to cover such special cases.

I am hesitant about adding a special test for input file with the .PO extension (the result will be different on Windows and non-Windows.)

I could try to build a dedicated test decorated with @unittest.skipUnless(sys.platform.startswith("win"), "only for Windows")...

Fixes a small bug introduced in previous commit (changed behaviour when an input file has an extension other than .po)

s-ball · 2025-03-20T08:15:58Z

Ok the tests for the infile/outfile computations are passing.

Back to the duplicate ids question, it should not be that hard:

add a lid variable in process to record the lno value when a msgid is found
pass that lid along with infile to the add method and store tuples (msgstr, infile, lid) in the messages dict
in add, test whether a key is present before adding a new record and abort with a message - we now have the file and line number of the previous and current id
in generate replace the 2 occurrences of messages[id] with messages[id][0]

For the tests, the current test_both_with_outputfile should fail because the empty msgid is duplicated in the second file. We should just add a new PO file without header to have a passing test.

This would allow a better GNU msgfmt compatibility.

Do you think that this should go into this PR or into a new one?

s-ball · 2025-03-21T08:49:04Z

Started working on that point, and I fell not on technical but behavior questions. msgfmt chokes if more than one file contains a header, but the header is the only place where the encoding of the PO file can be declared. As GNU gettext contains a number of auxiliary programs that can help to manage a set of PO files this behavior is nice, but msgfmt.py has not them and it would be a serious limitation. My opinion is that msgfmt.py should accept a header in every file but only store the first (or the last?) in the resulting MO file. That way users could process PO files with other encodings than UTF-8. But I would love to have other opinions on that...

If you think that this points requires a longer discussion, or that it should be discussed on a different place, then it means that we should handle it in a followup PR. If you think that just making a special case for the header is reasonable, then I could implement it in a couple of days.

compile_messages parameters order has changed in a previous commit to allow compiling multiple PO files.

StanFromIreland · 2025-04-04T15:08:23Z

There will be a lot of conflicts with my hashing pr… this pr changes quite a lot. What do we want to do?

s-ball · 2025-04-04T17:19:15Z

I was not very happy to reverse to parameters order of compile_messages because I thought it could disturb users if they were used to the precedent order. If there are concurrent PR (which I was absolutely not aware), this is clearly a blocking point. IMHO the only possible ways are to sequence the PR if there are only two of them, or to revert the modification of compile_messages, and use a different function to compile many PO files. I am sorry if I have made a mistake here. It is my first PR and I really do not know the usages. Anyway are there other sources of conflict?

StanFromIreland · 2025-04-04T18:09:34Z

I guess it is up to @serhiy-storchaka who is the expert for msgfmt if I remember correctly as to what order we want these. (Who will have to deal with the conflicts)

serhiy-storchaka · 2025-04-04T20:02:26Z

Before proceeding with this issue, we must solve other issues: --exclude-file, --omit-header, use the source file encoding. I think they are important for this issue.

s-ball · 2025-04-04T21:04:41Z

If I correctly understand, I shall just leave this PR as it is now, without even worrying for conflicts, and only come back when the other issues will be fixed. Not really a problem, but do not forget to wake me when it will be the time to fix the conflicts... Anyway it is far from a very important one even for me as I have understood that I had to use a private copy for my own project. I have already been working that way for more that 6 years after all... Stupid me again! If there are conflicts with other changes, it means that those changes have been merged and that the conflicts are to be resolved... I just shall keep on coming here from time to time and fix them when I see them. But unless being advised to act differently I shall not try to handle the duplicate ids question before the PR is merged in its current state to avoid adding other sources of possible conflicts.

python-cla-bot · 2025-04-18T09:54:24Z

All commit authors signed the Contributor License Agreement.

s-ball added 3 commits December 2, 2018 20:17

Add test for current behaviour of msgfmt.py

73b5ac8

Test option processing, and conversion of one single file with or without the -o option. Also test the little documented behaviour of merging two input files with -o option.

Final fix for bpo-9741, with tests proving it.

5fb1575

Also show that it is now possible to build multiple po files in one single script call.

Update docstrings for the script and make function

b1968e9

the-knights-who-say-ni added the CLA not signed label Dec 3, 2018

bedevere-bot added the awaiting review label Dec 3, 2018

s-ball force-pushed the multi_inputs branch from 57514b5 to b1968e9 Compare December 4, 2018 09:41

the-knights-who-say-ni added CLA signed and removed CLA not signed labels Dec 4, 2018

s-ball added 3 commits December 4, 2018 10:42

Give make a simpler and more consistent interface.

0bc4ad3

Add a Misc/NEWS.d entry.

a9e67b4

Merge branch 'multi_inputs' of https://github.com/s-ball/cpython into…

6c59d6c

… multi_inputs

s-ball mannequin mentioned this pull request Apr 10, 2022

msgfmt should be able to merge more than one po file #79516

Open

timothytylee mannequin mentioned this pull request Apr 10, 2022

msgfmt.py generates invalid mo because msgfmt.make() does not clear dictionary #53950

Open

ezio-melotti removed the CLA signed label Jul 13, 2022

s-ball added 2 commits January 23, 2025 14:13

Merge branch 'main' into multi_inputs

1ce22c0

Merge branch 'main' into multi_inputs

863bd97

s-ball force-pushed the multi_inputs branch from d43321e to 863bd97 Compare January 23, 2025 18:17

s-ball and others added 3 commits January 23, 2025 19:32

Fix an import error in test_i18n.py .

4390ede

When merging main into multi_inputs, the reference to os_helper was erroneously removed.

Fix another import error.

008ea27

In 2018, all imports came from the test.support package. They are now splitted among various subpackages.

Merge branch 'main' into multi_inputs

8744743

s-ball and others added 5 commits February 23, 2025 20:26

Merge branch 'main' into multi_inputs

93a6eb7

Revert version number to 1.2

ba26b80

The newly added tests require a 1.2 version.

Merge branch 'main' into multi_inputs

150cea1

Merge branch 'main' into multi_inputs

d9afabb

Merge branch 'main' into multi_inputs

9004387

zware changed the title ~~bpo-35335: explicitely allows msgfmt.py to compile more than one input po files~~ gh-79516: explicitely allows msgfmt.py to compile more than one input po files Feb 25, 2025

s-ball added 2 commits February 27, 2025 17:12

Merge branch 'multi_inputs' of https://github.com/s-ball/cpython into…

7556f79

… multi_inputs

Merge branch 'main' into multi_inputs

80947d1

Simplifies the general logic of msgfmt.py

b7a7c48

All the logic is now in make at the cost of a break in compatibility.

serhiy-storchaka reviewed Mar 18, 2025

View reviewed changes

StanFromIreland reviewed Mar 18, 2025

View reviewed changes

Tools/i18n/msgfmt.py Outdated Show resolved Hide resolved

Update Tools/i18n/msgfmt.py

1b3b73e

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>

s-ball added 2 commits March 18, 2025 23:25

Changes per review.

1db2c66

Remove one redundant test from test_msgfmt.py Remove unneeded changes from msgfmt.py

Merge branch 'multi_inputs' of https://github.com/s-ball/cpython into…

3a6e1ef

… multi_inputs

serhiy-storchaka reviewed Mar 19, 2025

View reviewed changes

s-ball added 2 commits March 19, 2025 13:52

Removes the now unused get_names function.

743bdc5

Fixes a small bug introduced in previous commit (changed behaviour when an input file has an extension other than .po)

Tests for extension corner cases for msgfmt.py

50e7145

s-ball and others added 3 commits April 4, 2025 09:39

Merge branch 'main' into multi_inputs

caa8955

Fix a parameter inversion in test_msgfmt.py

a4f1769

compile_messages parameters order has changed in a previous commit to allow compiling multiple PO files.

Merge branch 'main' into multi_inputs

11f6e69

python deleted a comment Apr 7, 2025

s-ball added 2 commits April 8, 2025 19:39

Merge branch 'main' into multi_inputs

0ed3107

Merge branch 'main' into multi_inputs

d1e0a26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-79516: allow msgfmt.py to compile multiple input po files #10875

gh-79516: allow msgfmt.py to compile multiple input po files #10875

s-ball commented Dec 3, 2018 •

edited by bedevere-app bot

Loading

the-knights-who-say-ni commented Dec 3, 2018

s-ball commented Jan 23, 2025

s-ball commented Mar 18, 2025

serhiy-storchaka commented Mar 18, 2025

s-ball commented Mar 18, 2025

serhiy-storchaka commented Mar 18, 2025

serhiy-storchaka left a comment

serhiy-storchaka Mar 18, 2025

serhiy-storchaka Mar 18, 2025

serhiy-storchaka Mar 18, 2025

serhiy-storchaka Mar 18, 2025

tomasr8 commented Mar 18, 2025

serhiy-storchaka commented Mar 18, 2025

s-ball commented Mar 18, 2025

serhiy-storchaka Mar 19, 2025

serhiy-storchaka Mar 19, 2025

s-ball Mar 19, 2025

s-ball Mar 19, 2025

serhiy-storchaka Mar 19, 2025

serhiy-storchaka Mar 19, 2025

s-ball Mar 19, 2025

s-ball commented Mar 20, 2025

s-ball commented Mar 21, 2025

StanFromIreland commented Apr 4, 2025

s-ball commented Apr 4, 2025

StanFromIreland commented Apr 4, 2025

serhiy-storchaka commented Apr 4, 2025

s-ball commented Apr 4, 2025 •

edited

Loading

python-cla-bot bot commented Apr 18, 2025 •

edited

Loading

		BEWARE: in previous version of make the first parameter was a string
		containing a single filename.

gh-79516: allow msgfmt.py to compile multiple input po files #10875

Are you sure you want to change the base?

gh-79516: allow msgfmt.py to compile multiple input po files #10875

Conversation

s-ball commented Dec 3, 2018 • edited by bedevere-app bot Loading

the-knights-who-say-ni commented Dec 3, 2018

s-ball commented Jan 23, 2025

s-ball commented Mar 18, 2025

serhiy-storchaka commented Mar 18, 2025

s-ball commented Mar 18, 2025

serhiy-storchaka commented Mar 18, 2025

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomasr8 commented Mar 18, 2025

serhiy-storchaka commented Mar 18, 2025

s-ball commented Mar 18, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s-ball commented Mar 20, 2025

s-ball commented Mar 21, 2025

StanFromIreland commented Apr 4, 2025

s-ball commented Apr 4, 2025

StanFromIreland commented Apr 4, 2025

serhiy-storchaka commented Apr 4, 2025

s-ball commented Apr 4, 2025 • edited Loading

python-cla-bot bot commented Apr 18, 2025 • edited Loading

s-ball commented Dec 3, 2018 •

edited by bedevere-app bot

Loading

s-ball commented Apr 4, 2025 •

edited

Loading

python-cla-bot bot commented Apr 18, 2025 •

edited

Loading