-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-130703: Implement wrapping to width for msgids #130705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Requesting @tomasr8 @serhiy-storchaka :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not work.
- It can break escape sequences.
- The normalized message can already be multiline. Splitting it again will produce too short lines and even empty lines.
I need to update normalize to wrap respecting words |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you use textwrap.wrap
for wrapping? It's not perfect but it ought to detect do most of the job?
I'm afraid textwrap won't always work. I suggest adding the wrapping logic to the normalize function. pybabel does it in a similar way, you can have a look at their implementation: https://github.com/python-babel/babel/blob/master/babel/messages/pofile.py#L464 |
Implemented pybabels method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really the best for reviewing this but I can review the implementation. Please, don't just apply my suggestions as is and decide which one is the best.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that the regex approach is correct. It would gobble up consecutive spaces right?
I really recommend creating a dummy file with some gettext calls and comparing the differences between pygettext, xgettext and babel. There are some differences that should be considered. Here's two I noticed:
_('foos') ran with msgid ""
""
"foos"
msgstr "" while xgettext and babel give me this (i.e. they don't insert two extra msgid "foos"
msgstr "" |
As for the header, this will conflict with my implementation of |
Test fail unrelated. Wrapping header will require a separate function like so: Subject: [PATCH] Wrap header
---
Index: Tools/i18n/pygettext.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/Tools/i18n/pygettext.py b/Tools/i18n/pygettext.py
--- a/Tools/i18n/pygettext.py (revision 8d03cbf141068c4ac9812a967a4c9f5942e22d75)
+++ b/Tools/i18n/pygettext.py (date 1740913002600)
@@ -589,12 +589,36 @@
def _is_string_const(self, node):
return isinstance(node, ast.Constant) and isinstance(node.value, str)
+
+def _wrap_header(s, options):
+ lines = []
+ for line in s.splitlines():
+ if len(line) > options.width and ' ' in line:
+ words = _space_splitter(line)
+ words.reverse()
+ buf = []
+ size = 0
+ while words:
+ word = words.pop()
+ if size + len(word) <= options.width:
+ buf.append(word)
+ size += len(word)
+ else:
+ lines.append(''.join(buf))
+ buf = [word]
+ size = len(word)
+ lines.append(''.join(buf))
+ else:
+ lines.append(line)
+ return "\n".join(lines) + "\n"
+
+
def write_pot_file(messages, options, fp):
timestamp = time.strftime('%Y-%m-%d %H:%M%z')
encoding = fp.encoding if fp.encoding else 'UTF-8'
- print(pot_header % {'time': timestamp, 'version': __version__,
+ print(_wrap_header(pot_header % {'time': timestamp, 'version': __version__,
'charset': encoding,
- 'encoding': '8bit'}, file=fp)
+ 'encoding': '8bit'}, options), file=fp)
# Sort locations within each message by filename and lineno
sorted_keys = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks almost ready now. Please add more tests for cases. It may be convenient to use the same string with different widths.
Add tests for the cases when len(escaped_line) + len(prefix) + 3
equals to width
and when it equals to width
.
Add tests for the cases when new_size + 2
equals to width
and when it equals to width + 1
.
Add tests for too long first word (new_size + 2 >
widthand
buf` is empty) and for too long last word.
Add tests for whitespaces other than ' '
and '\n'
(e.g. for '\t'
and '\r'
), for non-ASCII line separators and whitespaces. Test for different escaping mode.
Do not add a separate method for every case. Group assertions for similar cases in one method.
Friendly ping @serhiy-storchaka :-) |
Sorry, the tests still do not satisfy me. I am going to play with them myself, and then propose my variant. |
width
is not implemented for msgids #130703