Skip to content

gh-118718: Incorrect decoding of preamble in email parser #134384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Lib/email/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,8 @@ def parse(self, fp, headersonly=False):
parsing after reading the headers or not. The default is False,
meaning it parses the entire contents of the file.
"""
fp = TextIOWrapper(fp, encoding='ascii', errors='surrogateescape')
encoding = "utf-8" if getattr(self.parser.policy, "utf8", False) else "ascii"
fp = TextIOWrapper(fp, encoding=encoding, errors='surrogateescape')
try:
return self.parser.parse(fp, headersonly)
finally:
Expand Down
38 changes: 38 additions & 0 deletions Lib/test/test_email/test_email.py
Original file line number Diff line number Diff line change
Expand Up @@ -3990,6 +3990,44 @@ def test_bytes_parser_on_exception_does_not_close_file(self):
fp)
self.assertFalse(fp.closed)

def test_bytes_parser_uses_policy_utf8_setting(self):
m = """
From: Nathaniel Nameson <nathan@nameson.com>
To: Ned Sampleson <ned@sampleson.com>
Subject: Sample message
MIME-Version: 1.0
Content-type: multipart/mixed; boundary="i-am-boundary"

This is the préamble. It is to be ignored, though it
is a handy place for mail composers to include an
explanatory note to non-MIME compliant readers.

--i-am-boundary
Content-type: text/plain; charset=us-ascii

This is explicitly typed plain ASCII text.
It DOES end with a linebreak.

--i-am-boundary
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

This should be correctly encapsulated: Un petit café ?

--i-am-boundary--
This is the epilogue. It is also to be ignored.

""".lstrip()
M_BYTES = BytesIO(m.encode())

msg = email.message_from_binary_file(M_BYTES, policy=email.policy.default.clone(utf8=True))
for i, part in enumerate(msg.iter_parts(), 1):
_ = part.as_string()

msg_string = msg.as_string()
self.assertIn("This is the préamble.", msg_string)
self.assertIn("Un petit café", msg_string)

def test_parser_does_not_close_file(self):
with openfile('msg_02.txt', encoding="utf-8") as fp:
email.parser.Parser().parse(fp)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Fix incorrect decoding of preamble in BytesParser
Contributed by Gustaf Gyllensporre.
Loading