-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
Some edge cases in email.utils.parsedate_to_datetime
seem to differ from RFC2822 spec
#126845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report!
That is, these functions parse but do not validate their input. Full validation is left for a third-party library. (Python could add it, taking a standard as an argument and validating based on that, but I don't think anyone's interested in maintaining it in stdlib.) With that, IMO:
|
@encukou thanks—sorry I missed that context. I'm happy to submit a PR for the year issue. |
For 1, yes, there is a bug here. A three digit year should have 1900 added, which currently doesn't happen. Will fixing that break anyone's code? I'm guessing not, so I'm OK with fixing it. But we should probably not backport it. As for the four digit year, yes that should be an error: only dates 1900 and later are technically valid. Changing that probably would break someones code, so I'm OK with returning four digit years before 1900 as written, since it is a reasonable Postel recovery (and actually makes more sense than restricting it to post 1900, IMO). That will change the current behavior for dates in the range 0000-0099, but again that seems unlikely to break anyone's code. For 2 and 3 we'd ideally raise an error, but that might break existing code. 4 depends on whether 5 follows the RFC, as noted by encukou. I would like to see the new API register defects for 1, 2, and 3, but doing that would require a fair bit of code tweaking, so we probably shouldn't waste time on it, at least currently. |
Uh oh!
There was an error while loading. Please reload this page.
Bug report
Bug description:
While tinkering around with
email.utils.parsedate_to_datetime
, I found some behavior that may be worth adjusting.1. low-number years aren't handled according to spec:
expected: either year 1, or a parsing failure. Neither the new or old format interpret 4-digit years this way.
2. offset minutes larger than 59 don't lead to parsing failure
expected: parse failure. Instead, the "90 minutes" component is parsed without issue (0590 being equal to 0630). The spec is actually not explicit about this, although "A date-time specification MUST be semantically valid". Note that a "90" value as minute in the time component does give the appropriate parsing failure.
Note:
datetime.fromisoformat()
has the same behavior. Also in this case, I can't determine whether ISO8601 explicitly disallows it. RFC3339 is clear on disallowing this.3. Invalid day-of-week doesn't lead to parsing failure
expected: parsing failure
4. Non-ASCII digits don't lead to parsing failure
If I'm reading the RFC correctly, only ASCII characters are valid.
expected: parsing failure
5. Handling of the
-0000
case may be inconsistent with drive to eliminate the practice of "naive UTC" datetimes.Lately, the
datetime
module appears to discourage the usage of naive datetimes to mean UTC, as evidenced by the deprecation ofutcnow()
and other methods.However,
parsedate_to_datetime
will return a naive datetime in the-0000
case.expected:
tzinfo=UTC
The spec says:
The spec again is a bit fuzzy, but my reading here is that
-0000
means "UTC, with no offset known". In contrast,+0000
means "UTC offset known to be 0". My impression would be that only omission of the offset should result in a naive datetime. What do you think?CPython versions tested on:
3.13
Operating systems tested on:
macOS
edit: typo
Linked PRs
The text was updated successfully, but these errors were encountered: