Skip to content

[Mailer][Mime] Support unicode email addresses #58361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 6, 2024
Merged

[Mailer][Mime] Support unicode email addresses #58361

merged 7 commits into from
Oct 6, 2024

Conversation

arnt
Copy link
Contributor

@arnt arnt commented Sep 23, 2024

Q A
Branch? 7.2
Bug fix? no
New feature? yes
Deprecations? no
License MIT

This allows applications to send mail to all-Chinese email addresses, or like my test address grå@grå.org. Code that uses Symfony needs no change and should experience no difference, although if the upstream MTA doesn't support it (most do by now) then an exception is thrown slightly later than before this change.

Before this commit, Envelope would throw InvalidArgumentException when a
unicode sender address was used. Now, that error is thrown slightly later,
is thrown for recipient addresses as well, but is not thrown if the
next-hop server supports SMTPUTF8.

As a side effect, transports that use JSON APIs to ESPs can also use
unicode addresses if the ESP supports that (many do, many don't).
@carsonbot
Copy link

Hey!

I see that this is your first PR. That is great! Welcome!

Symfony has a contribution guide which I suggest you to read.

In short:

  • Always add tests
  • Keep backward compatibility (see https://symfony.com/bc).
  • Bug fixes must be submitted against the lowest maintained branch where they apply (see https://symfony.com/releases)
  • Features and deprecations must be submitted against the 7.2 branch.

Review the GitHub status checks of your pull request and try to solve the reported issues. If some tests are failing, try to see if they are failing because of this change.

When two Symfony core team members approve this change, it will be merged and you will become an official Symfony contributor!
If this PR is merged in a lower version branch, it will be merged up to all maintained branches within a few days.

I am going to sit back now and wait for the reviews.

Cheers!

Carsonbot

@carsonbot carsonbot changed the title Support unicode email addresses [Mailer][Mime] Support unicode email addresses Sep 23, 2024
Also fix one mysteriously broken unit test.
@arnt
Copy link
Contributor Author

arnt commented Sep 23, 2024

I pushed a new commit resolving all received comments. Thanks!

@stof
Copy link
Member

stof commented Sep 24, 2024

Please fix the coding standards to follow the Symfony coding standards (see the fabbot.io check)

* The SMTPUTF8 extension is strictly required if any address
* contains a non-ASCII character in its localpart. If non-ASCII
* is only used in domains (e.g. horst@freiherr-von-mühlhausen.de)
* then it is possible to to send the message using IDN encoding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* then it is possible to to send the message using IDN encoding
* then it is possible to send the message using IDN encoding

@@ -44,10 +44,6 @@ public static function create(RawMessage $message): self

public function setSender(Address $sender): void
{
// to ensure deliverability of bounce emails independent of UTF-8 capabilities of SMTP servers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would bounces still work fine if your SMTP server supports SMTPUTF8 but the target server does not ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you get a 5xx error when the supporting SMTP client sends to the unsupporting SMTP server.

The Symfony code is written to minimize the chance of running into this at all, though. If you send to e.g. info@grå.org, Symfony will be able to send that even to an unsupporting client. That's why the code tests for non-ASCII in the localpart (not the entire address). This is the same approach as e.g. Exchange.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should elaborate. The 5xx error means that the server that generates the DNS is one that supports SMTPUTF8. It will generate a bounce that does not require SMTPUTF8 in order to be delivered, and which contains UTF8 in the body text.

It works quite well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Sender will become the recipient of the bounce. That's why I'm wondering how this would behave.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the sender uses an ASCII address (highly advisable at this time), then the bounce does not require SMTPUTF8. Delivering the DSN is simple.

If the sender uses a non-ASCII address, then Symfony's upstream MTA will generate a DSN that uses SMTPUTF8, but in this case we know that the sender has support for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bounces work reliably. My job (at ICANN) is one where I get to hear about this kind of problem ;)

If the recipient's server receives the message at all, then it supports SMTPUTF8. An extended server never forwards an SMTPUTF8 message to an unextended server. (This is a nontrivial design decision, and was made only after a large testbed experiment.)

This means that if any server along the path needs (or chooses) to bounce the message, then it has SMTPUTF8 support.

I wonder whether it makes sense to enforce ASCII in the sender's localpart… let me sleep on that, please. AIUI Symfony is used mostly to send mail from servers to users? Like a web server's noreply@example.com? Is Symfony also used by scripts that people like us run on the command line or from cron? There's a five-digit number of domains that actively use unicode email addresses, maybe even six-digit.

Copy link
Member

@stof stof Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what people build using symfony/mailer (nothing forbids you to write a PHP script meant to be run on the command line, and Symfony does not call home to report us that you did write such a script). However, my intuition is that at least 99% of usages (and probably much more than that) is about sending from a server

If the recipient's server receives the message at all, then it supports SMTPUTF8. An extended server never forwards an SMTPUTF8 message to an unextended server. (This is a nontrivial design decision, and was made only after a large testbed experiment.)

that's actually a good context to have to understand how this works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about forbidding UTF8 sender addresses, and…

I think the key here is that when people make the mistake of sending "here's the link to change password" or "your order has shipped" from a unicode address, they understand it really quickly. When you do that, maybe 20% of your outgoing mail bounces, and you change your configuration later on the same day.

It's not the kind of mistake that causes slow trouble over a long time, it's the kind of mistake that causes a lot of trouble immediately.

If, on the other hand, you write a script to process, sort and forward inbound mail, then you may not receive any from a unicode address very soon, and a limitaiton on sender address is one that shows up slowly, after a while, and seldom.

For this reason, I think the argument to forbid UTF8 sender addresses is fairly weak. But you can judge it better than I — my expertise is in unicode email and domain names, you know Symfony users and traditions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If, on the other hand, you write a script to process, sort and forward inbound mail, then you may not receive any from a unicode address very soon, and a limitaiton on sender address is one that shows up slowly, after a while, and seldom.

wouldn't those fail DMARC checks if you forward them using the original sender ?

I would vote for keeping the restriction on UTF8 sender, which will give immediate feedback to devs instead of waiting for them to get trouble with their delivery once reaching production (if they attempt to use an Unicode sender, it is likely that their own email servers will support them and so they won't get delivery issues in dev/staging)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never seen any of those scripts do external forwarding, which would subject the message to DMARC tests. People write code to sort mail to info@… into different classes, route some to autoresponders and some to various local addresses. If the body text mentions product x, the message is forwarded to address y, etc. Some companies make sure a Key Account Representative get a copy of mail to/from key customers.

Some mail servers can do that with rules, but if a company employs developers, it's often done in the language those developers use.

I'll add another commit that reinstates the restriction on the localpart, with a unit test or two that make sure of compatibility with all receivers. You can include or exclude that commit when you merge the PR (I do hope you'll merge). IMO both approaches are good, the choice of best is a matter of Symfony's philosophy.

This commit also adds a test that Symfony chooses IDN encoding when
possible (to be compatible with all email receivers), and adjusts a couple
of tests to match the name used in the main source code.
@javiereguiluz javiereguiluz added the ❄️ Feature Freeze Important Pull Requests to finish before the next Symfony "feature freeze" label Oct 3, 2024
@fabpot
Copy link
Member

fabpot commented Oct 6, 2024

Thank you @arnt.

@fabpot fabpot merged commit 4a9c2e8 into symfony:7.2 Oct 6, 2024
10 checks passed
@fabpot fabpot mentioned this pull request Oct 27, 2024
@ThomasLandauer
Copy link
Contributor

I'd like to question how this PR is explained to users in the docs.

I'm talking about the setup which we all think is the most common ("99%"):

Symfony  <-->  My SMTP server  <-->  Receiving SMTP server

This PR does two things:

  1. If my SMTP server announces 250-SMTPUTF8 in its EHLO reply, Symfony adds the SMTPUTF8 keyword to the MAIL TO command and submits the entire message in UTF8 (i.e. localpart, domain, content).
  2. If my SMTP server does not announce 250-SMTPUTF8, Symfony converts the domain to punycode and submits the message as ASCII.

Right?

In case 1, if the receiving SMTP server doesn't support SMTPUTF8, my SMTP server will send me a bounce DSN back. Right?
This is the crucial part IMO.
Cause these bounces could be avoided, if the non-ASCII characters are just in the domain!
(I'm assuming that no MTA will analyze the entire message and convert the domain to punycode to be able to deliver it to a non-SMTPUTF8 server.)

From my observation I would estimate that by far not all mail servers are supporting SMTPUTF8 (I would say around 50%); can you confirm this?
Which leads me to the bottom line:
If the non-ASCII characters are just in the domain, you can boost the deliverability if you manually convert it to punycode.

You can argue that people that have an IDN domain will probably have an SMTPUTF8-enabled mail server. (I don't have any data about this, do you?) But I'm not sure about this, since punycode is so widely used (in many other systems as well).

So instead of telling users that Symfony now fully supports non-ASCII email addresses (see https://symfony.com/doc/current/mailer.html#email-addresses), I would rather advise them to add idn_to_ascii() themselves :-)

What do you think?

@arnt
Copy link
Contributor Author

arnt commented Dec 3, 2024

Hi @ThomasLandauer

a couple of things.

I have data about SMTP. I work with this, for ICANN. In short: If you as a random internet user want to send mail to someone who uses a unicode address, the chance that their incoming server supports SMTPUTF8 is practically 100%, the chance that your outgoing server supports it is 80% or a little higher, as a wordwide average.

That's for people worldwide — the outgoing server composition for servers like Symfony is different than for humans. Humans can use Yahoo, web servers can use Sparkpost, see? The composition for diligently upgraded servers is different again. It's difficult to count that, though.

You're slightly wrong about what the PR does, BTW. The condition for adding the SMTPUTF8 keyword isn't that a server supports it, but rather that the destination address requires it.

Anyway, I spoke to a large mail company in China a few weeks ago, they just don't see the errors you mention any more, and I know a couple of implementations (one that I wrote) for which it also appears to be a no-op. Punycode does nothing for the deliverability.

I can guess why it is a nonproblem. This is guesswork, not based on data:

  • An address such as info@grå.org can be served without SMTPUTF8. But if you start serving mail for grå.org, I guess that fairly soon you'll want an address like grå@grå.org, and at that point your server needs SMTPUTF8.

  • Similar for a server that wants to send to generic email addresses. Punycode helps during the brief interval from when you first need to send to info@grå.org until you first need to send to grå@grå.org.

That small value doesn't outweigh the permanent cost of punycode in interoperability risks and showing users xn--foo-43243129.

@ThomasLandauer
Copy link
Contributor

But if you start serving mail for grå.org, I guess that fairly soon you'll want an address like grå@grå.org

This is exactly the point that I'm questioning.
In the German-speaking area (i.e. the inventors of umlauts ;-) the most common localparts are general-purpose terms like "office@", "info@", "service@", etc. So if somebody's domain is müller.xy, there is a chance that they will never use müller@müller.xy.
I can imagine that this is not true for Chinese, Arabic, etc.: I would guess that they use their characters either for everything or for nothing. Nobody wants an address info@国中互 - right?

You're slightly wrong about what the PR does, BTW. The condition for adding the SMTPUTF8 keyword isn't that a server supports it, but rather that the destination address requires it.

Well, if the localpart is ASCII, then SMTPUTF8 isn't really required, since there's an alternative (punycode).

So, after some more thinking about it, I would say: Symfony should behave like common MUA's (Thunderbird, Outlook, etc.) are behaving. Cause that's probably what most users expect: If it works in Thunderbird, it should work in Symfony.

The only problem is: I don't know how Thunderbird handles it ;-) Do you?

@arnt
Copy link
Contributor Author

arnt commented Dec 4, 2024

Thunderbird sends UTF8. The main punycode senders are Mutt and Exchange. I think there's a third notable one, can't remember which one that is. UTF8 is the majority anyway.

(I'll ask a colleague to survey how many of the relevant mail servers accept the punycode form of an address. It should be 100%, but sometimes people read documentation like this , test with gmail and don't notice that they've forgotten to add the xn-- form to the list of domains. Would be good to check how common that is.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Mailer Mime ❄️ Feature Freeze Important Pull Requests to finish before the next Symfony "feature freeze" Status: Reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants