Skip to content

gh-69117: Stop checking for valid month and day in strptime regex #26215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Lib/_strptime.py
Original file line number Diff line number Diff line change
Expand Up @@ -183,13 +183,13 @@ def __init__(self, locale_time=None):
base = super()
base.__init__({
# The " [1-9]" part of the regex is to make %c from ANSI C work
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems to indicate that %c might break if we don't allow for a space before a single digit, so I think we actually want (?P<d>\d{2}| {0,1}\d)" or (?P<d>\d{1,2}| \d).

That said, datetime.strptime(datetime(2021, 1, 1).strftime("%c"), "%c") seems to work OK with this change, so 🤷, maybe we just need to remove this comment? Chesterton's Fence indicates that we should probably try and figure out whether something has changed between when this comment was written and now.

'd': r"(?P<d>3[0-1]|[1-2]\d|0[1-9]|[1-9]| [1-9])",
'd': r"(?P<d>\d{1,2})",
'f': r"(?P<f>[0-9]{1,6})",
'H': r"(?P<H>2[0-3]|[0-1]\d|\d)",
'I': r"(?P<I>1[0-2]|0[1-9]|[1-9])",
'G': r"(?P<G>\d\d\d\d)",
'j': r"(?P<j>36[0-6]|3[0-5]\d|[1-2]\d\d|0[1-9]\d|00[1-9]|[1-9]\d|0[1-9]|[1-9])",
'm': r"(?P<m>1[0-2]|0[1-9]|[1-9])",
'm': r"(?P<m>\d{1,2})",
'M': r"(?P<M>[0-5]\d|\d)",
'S': r"(?P<S>6[0-1]|[0-5]\d|\d)",
'U': r"(?P<U>5[0-3]|[0-4]\d|\d)",
Expand Down
13 changes: 13 additions & 0 deletions Lib/test/datetimetester.py
Original file line number Diff line number Diff line change
Expand Up @@ -2611,6 +2611,19 @@ def test_strptime(self):
with self.assertRaises(ValueError): strptime("-000", "%z")
with self.assertRaises(ValueError): strptime("z", "%z")

def test_strptime_detailed_error(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things about these tests:

  1. I realize that the rest of this module uses this same style of "a bunch of asserts in a single file", but I think most of them were added before subTest was added to the standard library. I think it would be preferable to use subtests here, since failure in any one of the tests doesn't mean a prerequisite for the other tests would also fail.
  2. I really prefer the context manager style of assertRaises, since I think it's much easier to read.

If it were me, I'd probably put these tests in a loop, like so:

bad_month_re = "month must be in 1..12"
bad_day_re = "day is out of range for month"

tests = [
    ("0", "%m", bad_month_re),
    ("13", "%m", bad_month_re),
    ("0", "%d", bad_day_re),
    ("32", "%d", bad_day_re),
]

for dt_str, fmt_str, error_re in tests:
    with self.subTest(dt_str, fmt_str):
        with self.assertRaisesRegex(error_re):
            self.theclass.strptime(dt_str, fmt_str)

The use of a for loop for parameterization is something of a personal choice, though, feel free to skip that part. I find it a bit nicer because you can easily and cleanly add cases, but every control structure in a test is a possible source of error, so it's a bit of a toss-up.

# bpo-24929: detailed errors for month and day parsing failure

strptime = self.theclass.strptime

self.assertRaisesRegex(ValueError, "month must be in 1..12", strptime, '0', '%m')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general rule I don't like assertRaisesRegex because I don't think the specific error message raised should be part of the public interface. That said, I don't see a better way to test this.

Can we add a comment saying something like, "These tests are regression tests to ensure we have a good error message. We do not intend to guarantee any one specific error message."

Edit: ...On second thought, this might be more of a problem than I originally thought, because this is actually simply propagating the error from the datetime.datetime constructor. This feels quite fragile in a way that I would usually not be super happy with.

I'd love any suggestions on how we can test this without relying on that particular implementation detail (I am very much open to the possibility of just not putting a test for this at all).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind the original ticket was basically, "the regex is giving a really opaque error, and if we simply rely on the datetime constructor instead, we will get a much more informative error"

but if the problem is "we're throwing an opaque error", maybe I can instead make a more useful error at the regex level.

strptime('10', '%m')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this is already covered by another test somewhere and is not necessary here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not required - I meant for clarity line them up in a (too low, error) - (just right, no error) - (too high, error) triplet - can remove

self.assertRaisesRegex(ValueError, "month must be in 1..12", strptime, '13', '%m')

self.assertRaisesRegex(ValueError, "day is out of range for month", strptime, '0', '%d')
strptime('10', '%d')
self.assertRaisesRegex(ValueError, "day is out of range for month", strptime, '32', '%d')

def test_strptime_single_digit(self):
# bpo-34903: Check that single digit dates and times are allowed.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Get more useful error messages when using strptime to parse strings with bad month or day values.