-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-79516: allow msgfmt.py to compile multiple input po files #10875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Test option processing, and conversion of one single file with or without the -o option. Also test the little documented behaviour of merging two input files with -o option.
Also show that it is now possible to build multiple po files in one single script call.
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Our records indicate we have not received your CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. If you have recently signed the CLA, please wait at least one business day You can check yourself to see if the CLA has been received. Thanks again for your contribution, we look forward to reviewing it! |
When merging main into multi_inputs, the reference to os_helper was erroneously removed.
In 2018, all imports came from the test.support package. They are now splitted among various subpackages.
My patches now successfully pass all tests. Is there anything else I should do? |
The newly added tests require a 1.2 version.
Please forgive my previous comment. I was tired and should not have posted it. I wanted to keep a compatibility with the previous
in the docstring. As I now understand that any internal interface of a module from the Tools folder is an internal detail, it would indeed be much simpler and clearer to always pass an iterable of strings to make. I shall soon push a new commit doing that. |
AFAIK, the differences between the GNU
So, combining multiple PO source files is needed for consistency with the GNU
Removing the global MESSAGES is not necessary for fixing the The simplest fix of the original issue is: diff --git a/Tools/i18n/msgfmt.py b/Tools/i18n/msgfmt.py
index 878d3e29077..d9c46fcd166 100755
--- a/Tools/i18n/msgfmt.py
+++ b/Tools/i18n/msgfmt.py
@@ -250,6 +250,9 @@ def main():
return
for filename in args:
+ if outfile is None:
+ global MESSAGES
+ MESSAGES = {}
make(filename, outfile)
The code is inefficient if the output file is specified, because it will be rewritten again and again for every input file, but it will work. The code needs rewriting for efficiency and clarity, and for better API if we decide to provide it in future. |
All the logic is now in make at the cost of a break in compatibility.
@serhiy-storchaka I am afraid I did not fully understand your last comment. Do you really want me to bring back the global |
No, it is only demonstration that all other changes is a beautification of the code. We usually does not allow pure cosmetic changes, but we can allow them if they are related to the behavior changes and make the final code clearer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GNU msgfmt
treats duplicated keys as conflicts. Silently ignoring can lead to bugs.
Tools/i18n/msgfmt.py
Outdated
BEWARE: in previous version of make the first parameter was a string | ||
containing a single filename. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This note will soon became confusing, as it is not clear what version is previous. The right place for it is in the commit message, not in docstring.
Tools/i18n/msgfmt.py
Outdated
# each PO file generates its corresponding MO file | ||
for filename in filenames: | ||
messages = {} | ||
infile, outfile = get_names(filename, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we can refactor get_names()
. Split it on two parts -- for infile and for outfile. The first part can be inlined and moved in process()
, and the second part can be left here, in make()
.
Tools/i18n/msgfmt.py
Outdated
ID = 1 | ||
STR = 2 | ||
CTXT = 3 | ||
def make(filenames, outfile=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No default value is needed.
Lib/test/test_tools/test_msgfmt.py
Outdated
"""Tests for multiple input files | ||
""" | ||
|
||
def test_no_outputfile(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test looks redundant. test_both_without_outputfile
supersedes it, and this is MultiInputTest.
We touched upon this a bit before: #10875 (comment) We could either fix it now or leave it for a followup PR. Which one would you prefer? |
If the solution is non-trivial, it can wait. I'm just worried that it may require rewriting the tests added here. |
Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
It would be rather simple to test if a new msgid already exists and raise an exception. But GNU msgfmt gives more: the exact file and line of the first occurrence and of the second one. That part will require a non trivial work... For the tests part, the only change would be that the current merge should abort (the |
Remove one redundant test from test_msgfmt.py Remove unneeded changes from msgfmt.py
Tools/i18n/msgfmt.py
Outdated
|
||
def get_names(filename, outfile): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is no longer used.
Tools/i18n/msgfmt.py
Outdated
# each PO file generates its corresponding MO file | ||
for filename in filenames: | ||
messages = {} | ||
outfile = os.path.splitext(filename)[0] + '.mo' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a change in the behavior if filename
does not end with .po
.
For testing, you can use other suffix for one of test PO files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the reason why outfile
was computed after infile
. As infile
is always computed the same way, be outfile
present or not, I externalized it in the get_names
method when I write my initial patch in 2018. I am sorry but it was a long time ago and I had completely forgotten about that.
The only way I can imagine without code duplication is that process
returns its input filename. But I would not find that very nice on a separation of concern point of view. Do you prefer that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a second thought, that latter way is more efficient because the output file name is not computed when it was provided. I'll try to implement it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is up to you. But computing outfile
when it was not used was not elegant.
You can factor out computation of infile
(without outfile
) into a function and call it in both loops. Or you can pre-process the list of filenames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of normcase()
complicates the code (it was the right decision). In the end, your code with get_names()
may be not the worst option. It only looked not elegant. In any case, please add more tests or modify existing tests to cover such special cases.
I am hesitant about adding a special test for input file with the .PO
extension (the result will be different on Windows and non-Windows.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could try to build a dedicated test decorated with @unittest.skipUnless(sys.platform.startswith("win"), "only for Windows")
...
Fixes a small bug introduced in previous commit (changed behaviour when an input file has an extension other than .po)
Ok the tests for the infile/outfile computations are passing. Back to the duplicate ids question, it should not be that hard:
For the tests, the current This would allow a better GNU msgfmt compatibility. Do you think that this should go into this PR or into a new one? |
Started working on that point, and I fell not on technical but behavior questions. If you think that this points requires a longer discussion, or that it should be discussed on a different place, then it means that we should handle it in a followup PR. If you think that just making a special case for the header is reasonable, then I could implement it in a couple of days. |
compile_messages parameters order has changed in a previous commit to allow compiling multiple PO files.
There will be a lot of conflicts with my hashing pr… this pr changes quite a lot. What do we want to do? |
I was not very happy to reverse to parameters order of |
I guess it is up to @serhiy-storchaka who is the expert for msgfmt if I remember correctly as to what order we want these. (Who will have to deal with the conflicts) |
Before proceeding with this issue, we must solve other issues: |
|
msgfmt.py (from Tools/i18n) can now reliably be passed more than one input po file.
In addition, its
make
central function can reliably be called repeatedly, which fixes bpo-9741https://bugs.python.org/issue35335