gh-130703: Implement wrapping to width for msgids #130705

StanFromIreland · 2025-02-28T19:29:18Z

Issue: pygettext: Wrapping to width is not implemented for msgids #130703

StanFromIreland · 2025-02-28T19:30:27Z

Requesting @tomasr8 @serhiy-storchaka :-)

serhiy-storchaka

This does not work.

It can break escape sequences.
The normalized message can already be multiline. Splitting it again will produce too short lines and even empty lines.

StanFromIreland · 2025-02-28T20:10:53Z

I need to update normalize to wrap respecting words

picnixz

Can't you use textwrap.wrap for wrapping? It's not perfect but it ought to detect do most of the job?

tomasr8 · 2025-02-28T20:40:02Z

I'm afraid textwrap won't always work. I suggest adding the wrapping logic to the normalize function. pybabel does it in a similar way, you can have a look at their implementation: https://github.com/python-babel/babel/blob/master/babel/messages/pofile.py#L464

StanFromIreland · 2025-03-01T09:52:29Z

Implemented pybabels method.

picnixz

I'm not really the best for reviewing this but I can review the implementation. Please, don't just apply my suggestions as is and decide which one is the best.

Tools/i18n/pygettext.py

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

picnixz

I'm not sure that the regex approach is correct. It would gobble up consecutive spaces right?

Tools/i18n/pygettext.py

tomasr8 · 2025-03-02T10:09:58Z

I really recommend creating a dummy file with some gettext calls and comparing the differences between pygettext, xgettext and babel. There are some differences that should be considered. Here's two I noticed:

The header is not wrapped but both xgettext and babel do wrap it.
This file:

_('foos')

ran with --width=3 produces this output:

msgid ""
""
"foos"
msgstr ""

while xgettext and babel give me this (i.e. they don't insert two extra "" when the line does not get wrapped):

msgid "foos"
msgstr ""

StanFromIreland · 2025-03-02T10:24:23Z

As for the header, this will conflict with my implementation of --omit-header, could that get merged first (or vice versa)?

StanFromIreland · 2025-03-02T10:58:56Z

Test fail unrelated.

Wrapping header will require a separate function like so:

Subject: [PATCH] Wrap header
---
Index: Tools/i18n/pygettext.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/Tools/i18n/pygettext.py b/Tools/i18n/pygettext.py
--- a/Tools/i18n/pygettext.py	(revision 8d03cbf141068c4ac9812a967a4c9f5942e22d75)
+++ b/Tools/i18n/pygettext.py	(date 1740913002600)
@@ -589,12 +589,36 @@
     def _is_string_const(self, node):
         return isinstance(node, ast.Constant) and isinstance(node.value, str)
 
+
+def _wrap_header(s, options):
+    lines = []
+    for line in s.splitlines():
+        if len(line) > options.width and ' ' in line:
+            words = _space_splitter(line)
+            words.reverse()
+            buf = []
+            size = 0
+            while words:
+                word = words.pop()
+                if size + len(word) <= options.width:
+                    buf.append(word)
+                    size += len(word)
+                else:
+                    lines.append(''.join(buf))
+                    buf = [word]
+                    size = len(word)
+            lines.append(''.join(buf))
+        else:
+            lines.append(line)
+    return "\n".join(lines) + "\n"
+
+
 def write_pot_file(messages, options, fp):
     timestamp = time.strftime('%Y-%m-%d %H:%M%z')
     encoding = fp.encoding if fp.encoding else 'UTF-8'
-    print(pot_header % {'time': timestamp, 'version': __version__,
+    print(_wrap_header(pot_header % {'time': timestamp, 'version': __version__,
                         'charset': encoding,
-                        'encoding': '8bit'}, file=fp)
+                        'encoding': '8bit'}, options), file=fp)
 
     # Sort locations within each message by filename and lineno
     sorted_keys = [

Tools/i18n/pygettext.py

serhiy-storchaka

It looks almost ready now. Please add more tests for cases. It may be convenient to use the same string with different widths.

Add tests for the cases when len(escaped_line) + len(prefix) + 3 equals to width and when it equals to width.

Add tests for the cases when new_size + 2 equals to width and when it equals to width + 1.

Add tests for too long first word (new_size + 2 > widthandbuf` is empty) and for too long last word.

Add tests for whitespaces other than ' ' and '\n' (e.g. for '\t' and '\r'), for non-ASCII line separators and whitespaces. Test for different escaping mode.

Do not add a separate method for every case. Group assertions for similar cases in one method.

Tools/i18n/pygettext.py

Lib/test/test_tools/test_i18n.py

Tools/i18n/pygettext.py

Lib/test/test_tools/test_i18n.py

StanFromIreland · 2025-03-13T18:46:51Z

Friendly ping @serhiy-storchaka :-)

serhiy-storchaka · 2025-03-14T10:55:27Z

Sorry, the tests still do not satisfy me. I am going to play with them myself, and then propose my variant.

Add logic to wrap and test

b3ccc45

bedevere-app bot added the awaiting review label Feb 28, 2025

bedevere-app bot mentioned this pull request Feb 28, 2025

pygettext: Wrapping to width is not implemented for msgids #130703

Open

StanFromIreland added 2 commits February 28, 2025 19:31

Fix NEWS name -- We don't want miliseconds

33149ed

Change extract func in test

0e35e36

serhiy-storchaka reviewed Feb 28, 2025

View reviewed changes

StanFromIreland marked this pull request as draft February 28, 2025 20:10

bedevere-app bot removed the awaiting review label Feb 28, 2025

picnixz reviewed Feb 28, 2025

View reviewed changes

Use a modified version of pybabel's code in normalize

92f227f

StanFromIreland requested review from serhiy-storchaka and picnixz March 1, 2025 09:51

Minor tweak

f0ee9c4

picnixz reviewed Mar 1, 2025

View reviewed changes

Tools/i18n/pygettext.py Outdated Show resolved Hide resolved

Tools/i18n/pygettext.py Outdated Show resolved Hide resolved

Tools/i18n/pygettext.py Outdated Show resolved Hide resolved

Tools/i18n/pygettext.py Outdated Show resolved Hide resolved

StanFromIreland and others added 2 commits March 1, 2025 10:17

Update argparse snapshot

843e3fa

Bénédikt's suggestions

7fc34ca

StanFromIreland requested a review from picnixz March 1, 2025 10:19

StanFromIreland marked this pull request as ready for review March 1, 2025 10:20

StanFromIreland requested a review from savannahostrowski as a code owner March 1, 2025 10:20

bedevere-app bot added the awaiting review label Mar 1, 2025

picnixz reviewed Mar 1, 2025

View reviewed changes

Tools/i18n/pygettext.py Outdated Show resolved Hide resolved

Tools/i18n/pygettext.py Outdated Show resolved Hide resolved

StanFromIreland and others added 2 commits March 1, 2025 11:03

Preserve spaces and remove unnecessary checks

8d319b4

Improve comment

9197688

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

picnixz reviewed Mar 1, 2025

View reviewed changes

Tools/i18n/pygettext.py Outdated Show resolved Hide resolved

Tools/i18n/pygettext.py Outdated Show resolved Hide resolved

Add test and sort imports

7c8637e

StanFromIreland requested a review from picnixz March 1, 2025 11:19

More of Benedikt's suggestions

4b02678

StanFromIreland requested a review from picnixz March 2, 2025 09:59

Don't wrap for single words

8d03cbf

serhiy-storchaka reviewed Mar 2, 2025

View reviewed changes

Address Serhiy's suggestions

fbe5b93

StanFromIreland requested a review from serhiy-storchaka March 2, 2025 15:02

Use more complex pattern

8d5f84f

serhiy-storchaka reviewed Mar 2, 2025

View reviewed changes

picnixz removed their request for review March 2, 2025 17:18

Serhiy's suggestions

ae53774

StanFromIreland requested a review from serhiy-storchaka March 2, 2025 17:22

serhiy-storchaka reviewed Mar 3, 2025

View reviewed changes

Tools/i18n/pygettext.py Show resolved Hide resolved

Tools/i18n/pygettext.py Outdated Show resolved Hide resolved

StanFromIreland added 2 commits March 3, 2025 18:48

Serhiy's suggestions

794fc8b

Clean up

47bfa29

StanFromIreland requested a review from serhiy-storchaka March 3, 2025 18:49

tomasr8 reviewed Mar 3, 2025

View reviewed changes

Lib/test/test_tools/test_i18n.py Outdated Show resolved Hide resolved

Lib/test/test_tools/test_i18n.py Show resolved Hide resolved

Apply suggestions from Tomas

b6f128f

serhiy-storchaka reviewed Mar 5, 2025

View reviewed changes

Apply suggestions from Serhiy

a4823a7

StanFromIreland requested review from serhiy-storchaka, picnixz and tomasr8 March 5, 2025 18:50

picnixz removed their request for review March 23, 2025 12:26

serhiy-storchaka self-assigned this Apr 9, 2025

StanFromIreland mentioned this pull request Apr 27, 2025

pygettext: Improve test coverage #130197

Open

18 tasks

Uh oh!

gh-130703: Implement wrapping to width for msgids #130705

Are you sure you want to change the base?

gh-130703: Implement wrapping to width for msgids #130705

Uh oh!

Conversation

StanFromIreland commented Feb 28, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StanFromIreland commented Feb 28, 2025

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

StanFromIreland commented Feb 28, 2025

Uh oh!

picnixz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomasr8 commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StanFromIreland commented Mar 1, 2025

Uh oh!

picnixz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tomasr8 commented Mar 2, 2025

Uh oh!

StanFromIreland commented Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StanFromIreland commented Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StanFromIreland commented Mar 13, 2025

Uh oh!

serhiy-storchaka commented Mar 14, 2025

Uh oh!

Uh oh!

StanFromIreland commented Feb 28, 2025 •

edited by bedevere-app bot

Loading

picnixz left a comment •

edited

Loading

tomasr8 commented Feb 28, 2025 •

edited

Loading

picnixz left a comment •

edited

Loading

StanFromIreland commented Mar 2, 2025 •

edited

Loading

StanFromIreland commented Mar 2, 2025 •

edited

Loading