Skip to content

bpo-34222: Lib/email: Fix infinite loop when folding #8990

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

bpo-34222: Lib/email: Fix infinite loop when folding #8990

wants to merge 2 commits into from

Conversation

Xiami2012
Copy link

@Xiami2012 Xiami2012 commented Aug 29, 2018

Currently when folding headers with length > maxlen, _fold_as_ew tries
to split the to_encode into multiple parts to fulfill the maxlen limit,
in an inapropriate way.

If a long header has non-ascii characters, in some situations (e.g. a
Subject: with full of CJK chars), it will split the to_encode into
["", to_encode], entering an infinite loop.

This commit fixes this by introduce a smarter way to split.
Besides, when an header needs to be folded now, every non-last line will
try its best to reach the maxlen, in O(log N) time.
Also, apply missing charset= parameter for _ew.encode.

The bug is introduced in commit 85d5c18

https://bugs.python.org/issue34222

@Xiami2012 Xiami2012 requested a review from a team as a code owner August 29, 2018 11:06
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

Currently when folding headers with length > maxlen, _fold_as_ew tries
to split the to_encode into multiple parts to fulfill the maxlen limit,
in an inapropriate way.

If a long header has non-ascii characters, in some situations (e.g. a
Subject: with full of CJK chars), it will split the to_encode into
["", to_encode], entering an infinite loop.

This commit fixes this by introducing a smarter way to split.
Besides, when an header needs to be folded now, every non-last line will
try its best to reach the maxlen, in O(log N) time.
Also, apply missing charset= parameter for _ew.encode.

The bug is introduced in commit 85d5c18
if len(ew) > remaining_space:
# Find the longest first_part
# since len(_ew.encode(to_encode[:x])) is a non-linear
# monotonically increasing function, and calculating the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ew.encode is biased towards the 'q' encoding. This might violate the assumption of a monotonically increasing function for some corner cases. (This was already the case for the old code.)

I hope to find the time to write a test case for this.

@csabella
Copy link
Contributor

csabella commented Jun 5, 2019

Thank you for the contribution. This was fixed in GH-12020, so I'm closing this as a duplicate.

@csabella csabella closed this Jun 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants