Some edge cases in `email.utils.parsedate_to_datetime` seem to differ from RFC2822 spec #126845

ariebovenberg · 2024-11-14T22:02:25Z

Bug report

Bug description:

While tinkering around with email.utils.parsedate_to_datetime, I found some behavior that may be worth adjusting.

1. low-number years aren't handled according to spec:

The year is any numeric year 1900 or later. [section 3.3]

[section 4.3] The syntax for the obsolete date format allows a 2 digit year.
[..]
Where a two or three digit year occurs in a date, the year is to be
interpreted as follows: If a two digit year is encountered whose
value is between 00 and 49, the year is interpreted by adding 2000,
ending up with a value between 2000 and 2049. If a two digit year is
encountered with a value between 50 and 99, or any three digit year
is encountered, the year is interpreted by adding 1900.

>>> parsedate_to_datetime("Sat, 15 Aug 0001 23:12:09 +0500")
datetime.datetime(2001, 8, 15, 23, 12, 9, ...)

expected: either year 1, or a parsing failure. Neither the new or old format interpret 4-digit years this way.

2. offset minutes larger than 59 don't lead to parsing failure

>>> parsedate_to_datetime('Sat, 15 Aug 0001 23:12:09 +0590')
datetime.datetime(2001, 8, 15, 23, 12, 9, tzinfo=datetime.timezone(datetime.timedelta(seconds=23400)))

expected: parse failure. Instead, the "90 minutes" component is parsed without issue (0590 being equal to 0630). The spec is actually not explicit about this, although "A date-time specification MUST be semantically valid". Note that a "90" value as minute in the time component does give the appropriate parsing failure.

Note: datetime.fromisoformat() has the same behavior. Also in this case, I can't determine whether ISO8601 explicitly disallows it. RFC3339 is clear on disallowing this.

3. Invalid day-of-week doesn't lead to parsing failure

>>> parsedate_to_datetime('Sun, 15 Aug 0001 23:12:09 +0520')  # actually a saturday

expected: parsing failure

A date-time specification MUST be semantically valid. That is, the
day-of-the-week (if included) MUST be the day implied by the date,

4. Non-ASCII digits don't lead to parsing failure

If I'm reading the RFC correctly, only ASCII characters are valid.

>>> parsedate_to_datetime('Sat, 15 Aug 01 𝟚𝟛:𝟝𝟛:𝟛𝟛 +0500')  # note the fancy numbers
datetime.datetime(2001, 8, 15, 23, 53, 33, ...)

expected: parsing failure

5. Handling of the `-0000` case may be inconsistent with drive to eliminate the practice of "naive UTC" datetimes.

Lately, the datetime module appears to discourage the usage of naive datetimes to mean UTC, as evidenced by the deprecation of utcnow() and other methods.

However, parsedate_to_datetime will return a naive datetime in the -0000 case.

>>> parsedate_to_datetime("Sat, 15 Aug 01 23:53:33 -0000")
datetime.datetime(2001, 8, 15, 23, 53, 33)

expected: tzinfo=UTC

The spec says:

"-0000" also indicates Universal Time, it is
used to indicate that the time was generated on a system that may be
in a local time zone other than Universal Time and therefore
indicates that the date-time contains no information about the local
time zone.

The spec again is a bit fuzzy, but my reading here is that -0000 means "UTC, with no offset known". In contrast, +0000 means "UTC offset known to be 0". My impression would be that only omission of the offset should result in a naive datetime. What do you think?

CPython versions tested on:

3.13

Operating systems tested on:

macOS

edit: typo

Linked PRs

The text was updated successfully, but these errors were encountered:

encukou · 2024-11-15T14:07:12Z

Thanks for the report!

parsedate_to_datetime docs refer to parsedate(), whose docs say:

some mailers don’t follow that format as specified, so parsedate() tries to guess correctly in such cases.

That is, these functions parse but do not validate their input. Full validation is left for a third-party library. (Python could add it, taking a standard as an argument and validating based on that, but I don't think anyone's interested in maintaining it in stdlib.)

With that, IMO:

is a bug, that should be year 1. Do you want to send a PR?
is fine
is fine
is fine
is fine -- the docs cover this case explicitly. (AFAICS, RFC 2822 does not allow omitting the zone, so -0000 is the way to indicate a naive datetime.)

ariebovenberg · 2024-11-15T19:24:53Z

@encukou thanks—sorry I missed that context. I'm happy to submit a PR for the year issue.

bitdancer · 2025-06-10T18:04:31Z

For 1, yes, there is a bug here. A three digit year should have 1900 added, which currently doesn't happen. Will fixing that break anyone's code? I'm guessing not, so I'm OK with fixing it. But we should probably not backport it.

As for the four digit year, yes that should be an error: only dates 1900 and later are technically valid. Changing that probably would break someones code, so I'm OK with returning four digit years before 1900 as written, since it is a reasonable Postel recovery (and actually makes more sense than restricting it to post 1900, IMO). That will change the current behavior for dates in the range 0000-0099, but again that seems unlikely to break anyone's code.

For 2 and 3 we'd ideally raise an error, but that might break existing code.

4 depends on whether utf8 is true or not. Accepting non-ascii unicode numbers is a reasonable Postel recovery, so I don't think we should change that. The new email API should (I think, I haven't tested it) register a defect for that case.

5 follows the RFC, as noted by encukou.

I would like to see the new API register defects for 1, 2, and 3, but doing that would require a fair bit of code tweaking, so we probably shouldn't waste time on it, at least currently.

ariebovenberg added the type-bug An unexpected behavior, bug, or error label Nov 14, 2024

picnixz added the extension-modules C modules in the Modules dir label Nov 14, 2024

tomasr8 added the topic-email label Nov 14, 2024

picnixz added stdlib Python modules in the Lib dir and removed extension-modules C modules in the Modules dir labels Nov 14, 2024

github-actions bot mentioned this issue Dec 1, 2024

Monthly issue metrics report hugovk/test#88

Closed

GGyll mentioned this issue May 20, 2025

gh-126845: Some edge cases in email.utils.parsedate_to_datetime seem to differ from RFC2822 spec #134311

Closed

bedevere-app bot mentioned this issue May 20, 2025

gh-126845: Some edge cases in email.utils.parsedate_to_datetime seem to differ from RFC2822 spec #126845 #134350

Closed

GGyll mentioned this issue May 21, 2025

gh-126845: Some edge cases in email.utils.parsedate_to_datetime seem to differ from RFC2822 spec #134438

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Some edge cases in `email.utils.parsedate_to_datetime` seem to differ from RFC2822 spec #126845

Some edge cases in `email.utils.parsedate_to_datetime` seem to differ from RFC2822 spec #126845

ariebovenberg commented Nov 14, 2024 •

edited by bedevere-app bot

Loading

encukou commented Nov 15, 2024

Uh oh!

ariebovenberg commented Nov 15, 2024

Uh oh!

bitdancer commented Jun 10, 2025

Uh oh!

Uh oh!

Some edge cases in email.utils.parsedate_to_datetime seem to differ from RFC2822 spec #126845

Some edge cases in email.utils.parsedate_to_datetime seem to differ from RFC2822 spec #126845

Comments

ariebovenberg commented Nov 14, 2024 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug report

Bug description:

1. low-number years aren't handled according to spec:

2. offset minutes larger than 59 don't lead to parsing failure

3. Invalid day-of-week doesn't lead to parsing failure

4. Non-ASCII digits don't lead to parsing failure

5. Handling of the -0000 case may be inconsistent with drive to eliminate the practice of "naive UTC" datetimes.

CPython versions tested on:

Operating systems tested on:

Linked PRs

encukou commented Nov 15, 2024

Uh oh!

ariebovenberg commented Nov 15, 2024

Uh oh!

bitdancer commented Jun 10, 2025

Uh oh!

Some edge cases in `email.utils.parsedate_to_datetime` seem to differ from RFC2822 spec #126845

Some edge cases in `email.utils.parsedate_to_datetime` seem to differ from RFC2822 spec #126845

ariebovenberg commented Nov 14, 2024 •

edited by bedevere-app bot

Loading

5. Handling of the `-0000` case may be inconsistent with drive to eliminate the practice of "naive UTC" datetimes.