UTF-8 Email parsing/serialising: Roundtrip exits with “surrogates not allowed”

# Bug report

### Bug description:

In the attached Python minimal example, `email_raw_1` survives a round-trip from UTF-8 bytes string to an EmailMessage object and back to a string, while `email_raw_2` does not:

Traceback (most recent call last):   
  File "//surrogate_issue.py", line 29, in <module>
    print(message_2) 
  …
  File "/usr/local/lib/python3.12/email/_encoded_words.py", line 224, in encode
    bstring = string.encode(charset)
              ^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-2: surrogates not allowed


Funny thing is that the only difference is an additional digit in the middle of it.

The email is malformed, however, it is taken from an actual mail at <https://wilson.bronger.org/5105.txt>. Malformed or not, my other email machinery can deal with it, so I think Python should handle such real-world specimen on best-effort basis without exiting.



```python
#!/bin/python

import email, email.policy


email_raw_1 = """Content-Type: multipart/mixed; boundary="==="

--===
Content-Type: message/plain
 
 您0123456789012.3456789

--===--
""".encode()

email_raw_2 = """Content-Type: multipart/mixed; boundary="==="

--===
Content-Type: message/plain
 
 您0123456789012.34567890

--===--
""".encode()

message_1 = email.message_from_bytes(email_raw_1, policy=email.policy.SMTPUTF8)
message_2 = email.message_from_bytes(email_raw_2, policy=email.policy.SMTPUTF8)
print(message_1)
print(message_2)
```


### CPython versions tested on:

3.12

### Operating systems tested on:

Linux


### Linked PRs
* gh-113639
* gh-113730
* gh-113907
* gh-113908

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

UTF-8 Email parsing/serialising: Roundtrip exits with “surrogates not allowed” #113594

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

UTF-8 Email parsing/serialising: Roundtrip exits with “surrogates not allowed” #113594

Description

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions