bpo-12499: support custom len function in textwrap.wrap #28136

xi · 2021-09-03T08:13:32Z

I realize that this issue is 10 years old. However, this also provides a generic solution for bpo-24665 (2018), so I believe there is more interest in this than it might appear.

https://bugs.python.org/issue12499

…ter widths This also provides a generic solution for bpo-24665

the-knights-who-say-ni · 2021-09-03T08:13:35Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@xi

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

xi · 2021-09-03T08:15:15Z

I have signed the CLA just before submitting this PR (although with the bpo accout name, which is xi2). Do I have to take additional steps to make that work?

merwok

Thanks for sending a patch with tests!

I will read the discussion on the ticket to see if there were concerns about adding this.

Doc/library/textwrap.rst

Lib/test/test_textwrap.py

Lib/textwrap.py

merwok · 2021-09-03T13:03:58Z

The bot can’t find your contributor agreement, please follow the steps here: https://devguide.python.org/pullrequest/#licensing

merwok · 2021-09-03T18:30:47Z

Lib/test/test_textwrap.py

+
+        text = "123 🔧"
+        expected = ["123", "🔧"]
+        self.check_wrap(text, 5, expected, text_len=text_len)


I feel that more tests are needed; text_len is used at multiple places in the code.

Thanks for adding tests! Could some of them use different texts?

I would like to avoid adding example for languages that I do not speak myself (e.g. Chinese). Are there ways to reach out to the wider community to provide meaningful examples?

Lib/test/test_textwrap.py

tiptenbrink · 2021-11-09T15:07:25Z

Hi, this looks really good, I have some great use cases for this (think wrappers in terminal app packages like blessed). However, it seems to not work well with text_len functions that return zero width for some characters. Take for example the following sample function:

def text_len(text):
    return sum(
        0 if c == 'Q' else 1
        for c in text
    )

If we then run:

wrappr = textwrap.TextWrapper(width=6, text_len=text_len)
print(wrappr.wrap("01QQQQ2Q345678")) (other and total longer than 6)
print(wrappr.wrap("QQQQQ01234")) (other characters less than 6, total longer than 6)

We would expect that in the first case, it breaks after six 1 width characters, i.e. after 5. For the second, we expect it won't break at all. However the results are, respectively:

['01QQQQ', '2Q3456', '78']
['QQQQQ01234']

As you can see, it works fine when calculating if it has to break at all. But when it actually does break, it still counts the zero width characters as having width, which would not be what you expect.

I'll try to dive into this and report if I have found a solution.

tiptenbrink · 2021-11-09T15:58:47Z

The issue is with the following original code (comments are my own):

def _handle_long_word(self, reversed_chunks, cur_line, cur_len, width):
    ...
        if width < 1:
            space_left = 1
        else:
            space_left = width - cur_len
    ...
        if self.break_long_words:
            end = space_left # <-- HERE
            chunk = reversed_chunks[-1]
            if self.break_on_hyphens and self.text_len(chunk) > space_left:
    ...               
                hyphen = chunk.rfind('-', 0, space_left) # <-- HERE
                if hyphen > 0 and any(c != '-' for c in chunk[:hyphen]):
                    end = hyphen + 1
            cur_line.append(chunk[:end]) # <-- HERE
            reversed_chunks[-1] = chunk[end:] # <-- HERE
    ...

As you can see, it's tacitly assumed the text width is equal to the amount of characters, as it breaks on the index equal to the space left. You need a function to find on which index the width exceeds the line. I suggest the following:

    def _find_width_index(self, text, width):
        """_find_width_index(text : string, width: int)

        Find at which index the text has the required width.
        """
        # In most cases text_len will just use the number of characters, so this heuristic prevents calculating width
        # for each character
        if self.text_len(text[:width]) == width:
            # For character widths greater than one, width can be more than the number of characters
            return min(width, len(text))
        cur_text = ''
        for i, c in enumerate(text):
            cur_text += c
            cur_width = self.text_len(cur_text)
            if cur_width >= width:
                return i+1

And then the following amended _handle_long_word:

    def _handle_long_word(self, reversed_chunks, cur_line, cur_len, width):
        """_handle_long_word(chunks : [string],
                             cur_line : [string],
                             cur_len : int, width : int)

        Handle a chunk of text (most likely a word, not whitespace) that
        is too long to fit in any line.
        """
        # Figure out when indent is larger than the specified width, and make
        # sure at least one character is stripped off on every pass
        if width < 1:
            space_left = 1
        else:
            space_left = width - cur_len

        # If we're allowed to break long words, then do so: put as much
        # of the next chunk onto the current line as will fit.
        if self.break_long_words:
            chunk = reversed_chunks[-1]
            end = self._find_width_index(chunk, space_left)
            if self.break_on_hyphens and self.text_len(chunk) > space_left:
                # break after last hyphen, but only if there are
                # non-hyphens before it
                hyphen = chunk.rfind('-', 0, end)
                if hyphen > 0 and any(c != '-' for c in chunk[:hyphen]):
                    end = hyphen + 1
            cur_line.append(chunk[:end])
            reversed_chunks[-1] = chunk[end:]
    ...

I've tested this in tiptenbrink/textwrap, see tiptenbrink/textwrap@f93d7a2 for a proper diff.

A question for @merwok: how does a contribution to an existing PR work in this case? Do I need to also sign a CLA?

merwok · 2021-11-09T18:00:00Z

Yes, if you make a PR in xi’s repository and it’s accepted, then this PR here will have code by the both of you, and all contributors need to sign the CLA.

tiptenbrink · 2021-11-09T20:07:13Z

Thanks for the info. I've opened a PR @xi

_find_width_index and _handle_long_word change

xi · 2021-11-11T08:22:29Z

I merged the fix by @tiptenbrink but I am not completely convinced. We make some assumptions that I am not sure we can make:

text_len(a) >= 0
text_len(a) + text_len(b) = text_len(a + b)

Whether we can drop these assumptions depends on the guarantees we want to make: My guess is that "every line will have a maximum text_len of n" can be achieved without these assumptions, but "every line will be the longest possible string that does not exceed a text_len of n" cannot.

This got much more complicated than I had hoped. Should we continue or just drop this?

python-cla-bot · 2025-04-18T09:48:03Z

The following commit authors need to sign the Contributor License Agreement:

75669206+tiptenbrink@users.noreply.github.com

bpo-12499: textwrap.wrap: add control for fonts with different charac…

df3f367

…ter widths This also provides a generic solution for bpo-24665

the-knights-who-say-ni added the CLA not signed label Sep 3, 2021

bedevere-bot added the awaiting review label Sep 3, 2021

merwok removed the CLA not signed label Sep 3, 2021

the-knights-who-say-ni added the CLA not signed label Sep 3, 2021

merwok reviewed Sep 3, 2021

View reviewed changes

Doc/library/textwrap.rst Show resolved Hide resolved

Lib/test/test_textwrap.py Outdated Show resolved Hide resolved

Lib/test/test_textwrap.py Outdated Show resolved Hide resolved

Lib/textwrap.py Show resolved Hide resolved

xi added 3 commits September 3, 2021 18:09

react to feedback

97a2ec8

typo

db82b6c

fix missing len occurences

f457e20

merwok reviewed Sep 3, 2021

View reviewed changes

add more tests

e164780

merwok reviewed Sep 4, 2021

View reviewed changes

Lib/test/test_textwrap.py Outdated Show resolved Hide resolved

merwok changed the title ~~bpo-12499: textwrap.wrap: add control for fonts with different character widths~~ bpo-12499: support custom len function in textwrap.wrap Sep 4, 2021

xi added 2 commits September 5, 2021 07:50

fix idle test

09ce4ce

optimize text_len function

bf8dad5

xi requested a review from terryjreedy as a code owner September 5, 2021 05:51

merwok removed the CLA not signed label Sep 5, 2021

the-knights-who-say-ni added the CLA not signed label Sep 5, 2021

the-knights-who-say-ni added CLA signed and removed CLA not signed labels Oct 3, 2021

fixup

68e8098

_find_width_index and _handle_long_word change

e5d6d88

eragaxshim mentioned this pull request Nov 9, 2021

_find_width_index and _handle_long_word change xi/cpython#1

Closed

tiptenbrink mentioned this pull request Nov 9, 2021

_find_width_index and _handle_long_word change xi/cpython#2

Merged

tiptenbrink and others added 2 commits November 10, 2021 17:38

Apply changes, ensure min 1 char on line

7b32d0b

Merge pull request #2 from tiptenbrink/textwrap-len

b811049

_find_width_index and _handle_long_word change

merwok removed the CLA signed label Nov 11, 2021

the-knights-who-say-ni added the CLA signed label Nov 11, 2021

ezio-melotti removed the CLA signed label Jul 13, 2022

saulpw mentioned this pull request Jun 5, 2024

[multiline] wordwrap for non-single-width characters (CJK) saulpw/visidata#2416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-12499: support custom len function in textwrap.wrap #28136

bpo-12499: support custom len function in textwrap.wrap #28136

xi commented Sep 3, 2021 •

edited by bedevere-bot

Loading

the-knights-who-say-ni commented Sep 3, 2021

xi commented Sep 3, 2021

merwok left a comment

merwok commented Sep 3, 2021

merwok Sep 3, 2021

merwok Sep 4, 2021

xi Sep 5, 2021

tiptenbrink commented Nov 9, 2021

tiptenbrink commented Nov 9, 2021 •

edited

Loading

merwok commented Nov 9, 2021

tiptenbrink commented Nov 9, 2021

xi commented Nov 11, 2021 •

edited

Loading

python-cla-bot bot commented Apr 18, 2025

bpo-12499: support custom len function in textwrap.wrap #28136

Are you sure you want to change the base?

bpo-12499: support custom len function in textwrap.wrap #28136

Conversation

xi commented Sep 3, 2021 • edited by bedevere-bot Loading

the-knights-who-say-ni commented Sep 3, 2021

CLA Missing

xi commented Sep 3, 2021

merwok left a comment

Choose a reason for hiding this comment

merwok commented Sep 3, 2021

merwok Sep 3, 2021

Choose a reason for hiding this comment

merwok Sep 4, 2021

Choose a reason for hiding this comment

xi Sep 5, 2021

Choose a reason for hiding this comment

tiptenbrink commented Nov 9, 2021

tiptenbrink commented Nov 9, 2021 • edited Loading

merwok commented Nov 9, 2021

tiptenbrink commented Nov 9, 2021

xi commented Nov 11, 2021 • edited Loading

python-cla-bot bot commented Apr 18, 2025

xi commented Sep 3, 2021 •

edited by bedevere-bot

Loading

tiptenbrink commented Nov 9, 2021 •

edited

Loading

xi commented Nov 11, 2021 •

edited

Loading