-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
bpo-12499: support custom len function in textwrap.wrap #28136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ter widths This also provides a generic solution for bpo-24665
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA). CLA MissingOur records indicate the following people have not signed the CLA: For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. If you have recently signed the CLA, please wait at least one business day You can check yourself to see if the CLA has been received. Thanks again for the contribution, we look forward to reviewing it! |
I have signed the CLA just before submitting this PR (although with the bpo accout name, which is xi2). Do I have to take additional steps to make that work? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sending a patch with tests!
I will read the discussion on the ticket to see if there were concerns about adding this.
The bot can’t find your contributor agreement, please follow the steps here: https://devguide.python.org/pullrequest/#licensing |
Lib/test/test_textwrap.py
Outdated
|
||
text = "123 🔧" | ||
expected = ["123", "🔧"] | ||
self.check_wrap(text, 5, expected, text_len=text_len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that more tests are needed; text_len is used at multiple places in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding tests! Could some of them use different texts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to avoid adding example for languages that I do not speak myself (e.g. Chinese). Are there ways to reach out to the wider community to provide meaningful examples?
Hi, this looks really good, I have some great use cases for this (think wrappers in terminal app packages like blessed). However, it seems to not work well with text_len functions that return zero width for some characters. Take for example the following sample function:
If we then run:
We would expect that in the first case, it breaks after six 1 width characters, i.e. after 5. For the second, we expect it won't break at all. However the results are, respectively:
As you can see, it works fine when calculating if it has to break at all. But when it actually does break, it still counts the zero width characters as having width, which would not be what you expect. I'll try to dive into this and report if I have found a solution. |
The issue is with the following original code (comments are my own): def _handle_long_word(self, reversed_chunks, cur_line, cur_len, width):
...
if width < 1:
space_left = 1
else:
space_left = width - cur_len
...
if self.break_long_words:
end = space_left # <-- HERE
chunk = reversed_chunks[-1]
if self.break_on_hyphens and self.text_len(chunk) > space_left:
...
hyphen = chunk.rfind('-', 0, space_left) # <-- HERE
if hyphen > 0 and any(c != '-' for c in chunk[:hyphen]):
end = hyphen + 1
cur_line.append(chunk[:end]) # <-- HERE
reversed_chunks[-1] = chunk[end:] # <-- HERE
... As you can see, it's tacitly assumed the text width is equal to the amount of characters, as it breaks on the index equal to the space left. You need a function to find on which index the width exceeds the line. I suggest the following: def _find_width_index(self, text, width):
"""_find_width_index(text : string, width: int)
Find at which index the text has the required width.
"""
# In most cases text_len will just use the number of characters, so this heuristic prevents calculating width
# for each character
if self.text_len(text[:width]) == width:
# For character widths greater than one, width can be more than the number of characters
return min(width, len(text))
cur_text = ''
for i, c in enumerate(text):
cur_text += c
cur_width = self.text_len(cur_text)
if cur_width >= width:
return i+1 And then the following amended _handle_long_word: def _handle_long_word(self, reversed_chunks, cur_line, cur_len, width):
"""_handle_long_word(chunks : [string],
cur_line : [string],
cur_len : int, width : int)
Handle a chunk of text (most likely a word, not whitespace) that
is too long to fit in any line.
"""
# Figure out when indent is larger than the specified width, and make
# sure at least one character is stripped off on every pass
if width < 1:
space_left = 1
else:
space_left = width - cur_len
# If we're allowed to break long words, then do so: put as much
# of the next chunk onto the current line as will fit.
if self.break_long_words:
chunk = reversed_chunks[-1]
end = self._find_width_index(chunk, space_left)
if self.break_on_hyphens and self.text_len(chunk) > space_left:
# break after last hyphen, but only if there are
# non-hyphens before it
hyphen = chunk.rfind('-', 0, end)
if hyphen > 0 and any(c != '-' for c in chunk[:hyphen]):
end = hyphen + 1
cur_line.append(chunk[:end])
reversed_chunks[-1] = chunk[end:]
... I've tested this in tiptenbrink/textwrap, see tiptenbrink/textwrap@f93d7a2 for a proper diff. A question for @merwok: how does a contribution to an existing PR work in this case? Do I need to also sign a CLA? |
Yes, if you make a PR in xi’s repository and it’s accepted, then this PR here will have code by the both of you, and all contributors need to sign the CLA. |
Thanks for the info. I've opened a PR @xi |
_find_width_index and _handle_long_word change
I merged the fix by @tiptenbrink but I am not completely convinced. We make some assumptions that I am not sure we can make:
Whether we can drop these assumptions depends on the guarantees we want to make: My guess is that "every line will have a maximum text_len of n" can be achieved without these assumptions, but "every line will be the longest possible string that does not exceed a text_len of n" cannot. This got much more complicated than I had hoped. Should we continue or just drop this? |
The following commit authors need to sign the Contributor License Agreement: |
I realize that this issue is 10 years old. However, this also provides a generic solution for bpo-24665 (2018), so I believe there is more interest in this than it might appear.
https://bugs.python.org/issue12499