Skip to content

bpo-34222: Lib/email: Fix infinite loop when folding #8990

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 17 additions & 6 deletions Lib/email/_header_value_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -2726,12 +2726,23 @@ def _fold_as_ew(to_encode, lines, maxlen, last_ew, ew_combine_allowed, charset):
continue
first_part = to_encode[:text_space]
ew = _ew.encode(first_part, charset=encode_as)
excess = len(ew) - remaining_space
if excess > 0:
# encode always chooses the shortest encoding, so this
# is guaranteed to fit at this point.
first_part = first_part[:-excess]
ew = _ew.encode(first_part)
if len(ew) > remaining_space:
# Find the longest first_part
# since len(_ew.encode(to_encode[:x])) is a non-linear
# monotonically increasing function, and calculating the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ew.encode is biased towards the 'q' encoding. This might violate the assumption of a monotonically increasing function for some corner cases. (This was already the case for the old code.)

I hope to find the time to write a test case for this.

# exactly length requires knowing the internal of _ew.encode
# which seems dirty, use binary search here.
part_len_l = 0
part_len_r = text_space
while part_len_l + 1 < part_len_r:
part_len_m = (part_len_l + part_len_r) // 2
ew = _ew.encode(first_part[:part_len_m], charset=encode_as)
if len(ew) <= remaining_space:
part_len_l = part_len_m
else:
part_len_r = part_len_m
first_part = to_encode[:part_len_l]
ew = _ew.encode(first_part, charset=encode_as)
lines[-1] += ew
to_encode = to_encode[len(first_part):]
if to_encode:
Expand Down
6 changes: 6 additions & 0 deletions Lib/test/test_email/test__header_value_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -2687,6 +2687,12 @@ def test_unstructured_with_unicode_no_folds(self):
self._test(parser.get_unstructured("hübsch kleiner beißt"),
"=?utf-8?q?h=C3=BCbsch_kleiner_bei=C3=9Ft?=\n")

def test_unstructured_with_long_unicode_folded(self):
self._test(parser.get_unstructured("虾" * 40),
"=?utf-8?b?" + "6Jm+" * 16 + "?=\n"
" =?utf-8?b?" + "6Jm+" * 16 + "?=\n"
" =?utf-8?b?" + "6Jm+" * 8 + "?=\n")

def test_one_ew_on_each_of_two_wrapped_lines(self):
self._test(parser.get_unstructured("Mein kleiner Kaktus ist sehr "
"hübsch. Es hat viele Stacheln "
Expand Down
8 changes: 4 additions & 4 deletions Lib/test/test_email/test_headerregistry.py
Original file line number Diff line number Diff line change
Expand Up @@ -1643,10 +1643,10 @@ def test_fold_overlong_words_using_RFC2047(self):
self.assertEqual(
h.fold(policy=policy.default),
'X-Report-Abuse: =?utf-8?q?=3Chttps=3A//www=2Emailitapp=2E'
'com/report=5F?=\n'
' =?utf-8?q?abuse=2Ephp=3Fmid=3Dxxx-xxx-xxxx'
'xxxxxxxxxxxxxxxxxxxx=3D=3D-xxx-?=\n'
' =?utf-8?q?xx-xx=3E?=\n')
'com/report=5Fabuse?=\n'
' =?utf-8?q?=2Ephp=3Fmid=3Dxxx-xxx-xxxx'
'xxxxxxxxxxxxxxxxxxxx=3D=3D-xxx-xx-xx?=\n'
' =?utf-8?q?=3E?=\n')


if __name__ == '__main__':
Expand Down
1 change: 1 addition & 0 deletions Misc/ACKS
Original file line number Diff line number Diff line change
Expand Up @@ -1599,6 +1599,7 @@ Anish Tambe
Musashi Tamura
William Tanksley
Christian Tanzer
Pengyu Tao
Steven Taschuk
Amy Taylor
Julian Taylor
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix infinite loop when folding non-ASCII email headers