Skip to content

FIX: Disallow encoded words in e-mail addresses #33083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 5, 2025

Conversation

Drenmi
Copy link
Contributor

@Drenmi Drenmi commented Jun 5, 2025

What is this change?

RFC 5322 allows special characters, including ? and =, to be used in e-mail addresses.

RFC 2047 is an extension that adds a feature called "encoded words" which let you embed different encodings in the same header. However, it explicitly says that these aren't allowed in e-mail address headers.

Encoded words have the format:

encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

Where encoding is either Q or B, but could take on other values in the future.

After this change we consider e-mail addresses with an encoded word inside invalid.

@tgxworld tgxworld requested a review from Copilot June 5, 2025 02:53
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the email validator to reject RFC 2047 encoded-word syntax in addresses, and adds a corresponding spec case.

  • Add a negative check against encoded-word patterns in valid_value?
  • Introduce encoded_word_regex to detect =?charset?encoding?encoded-text?= fragments
  • Add a spec example for a q-encoded word in the local part

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
spec/lib/validators/email_address_validator_spec.rb Add a test for a q-encoded word in the local part
lib/validators/email_address_validator.rb Update valid_value? to reject encoded words and define its regex
Comments suppressed due to low confidence (1)

spec/lib/validators/email_address_validator_spec.rb:19

  • Only a lowercase q-encoding is tested. Add cases for uppercase Q, both B/b encodings, and different charset names to ensure full RFC 2047 coverage.
"te=?utf-8?q?st?=@discourse.org"

@tgxworld
Copy link
Contributor

tgxworld commented Jun 5, 2025

@Drenmi I do think this should be under the FIX: prefix instead of DEV: as it is a bug we are fixing.

@Drenmi Drenmi force-pushed the dev/disallow-encoded-words-in-emails branch 2 times, most recently from d113193 to 4408123 Compare June 5, 2025 03:01
@Drenmi Drenmi changed the title DEV: Disallow encoded words in e-mail addresses FIX: Disallow encoded words in e-mail addresses Jun 5, 2025
@Drenmi Drenmi force-pushed the dev/disallow-encoded-words-in-emails branch from 4408123 to d456569 Compare June 5, 2025 03:02
@Drenmi Drenmi force-pushed the dev/disallow-encoded-words-in-emails branch from d456569 to 1fd6cd2 Compare June 5, 2025 03:21
@Drenmi Drenmi merged commit 60a3fe4 into main Jun 5, 2025
15 of 16 checks passed
@Drenmi Drenmi deleted the dev/disallow-encoded-words-in-emails branch June 5, 2025 04:58
martin-brennan pushed a commit that referenced this pull request Jun 10, 2025
RFC 5322 allows special characters, including ? and =, to be used in e-mail addresses.

RFC 2047 is an extension that adds a feature called "encoded words" which let you embed different encodings in the same header. However, it explicitly says that these aren't allowed in e-mail address headers.

Encoded words have the format:

encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
Where encoding is either Q or B, but could take on other values in the future.

After this change we consider e-mail addresses with an encoded word inside invalid.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants