[Windows] test_int.test_denial_of_service failed: took 15ms #114911

encukou · 2024-02-02T14:47:30Z

test_int recently failed on a buildbot:

======================================================================
FAIL: test_denial_of_service_prevented_str_to_int (test.test_int.IntSubclassStrDigitLimitsTests.test_denial_of_service_prevented_str_to_int)
Regression test: ensure we fail before performing O(N**2) work.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "b:\uildarea\3.12.ware-win11.nondebug\build\Lib\test\test_int.py", line 743, in test_denial_of_service_prevented_str_to_int
    self.assertLessEqual(seconds_to_fail_extra_huge, seconds_to_convert/2)
AssertionError: 0.015625 not less than or equal to 0.0078125

I don't know for sure, but my guess is that a GC collection got triggered at the wrong time.

I'd like to add a test helper for timing CPU-bound tasks. It should disable GC and use process_time; perhaps it can do other tricks in the future too.

Linked PRs

The text was updated successfully, but these errors were encountered:

A few of our tests measure the time of CPU-bound operation, mainly to avoid quadratic or worse behaviour. Add a helper to ignore GC and time spent in other processes.

vstinner · 2024-02-12T16:46:28Z

AssertionError: 0.015625 not less than or equal to 0.0078125

The problem here is that time.monotonic() has a resolution of 15.6 ms on Windows. It can return 0 or 15.6 ms for short timing, nothing in between. I added "CLOCK_RES" to some tests:

$ grep 'CLOCK_RES =' Lib/test/ -R
Lib/test/test_asyncio/utils.py:CLOCK_RES = 0.050
Lib/test/_test_eintr.py:CLOCK_RES = 0.020
Lib/test/_test_multiprocessing.py:CLOCK_RES = 0.100
Lib/test/test_os.py:    CLOCK_RES = CLOCK_RES_NS * 1e-9

vstinner · 2024-02-12T17:02:30Z

The problem here is that time.monotonic() has a resolution of 15.6 ms on Windows

I don't know if it's the root issue or not. I will try to reproduce to debug the issue on Windows.

encukou · 2024-02-12T17:19:04Z

Hm, this is usingprocess_time. Should it use perf_counter instead?

sobolevn · 2024-02-16T07:21:24Z

Happened again: #115504 (comment)

A few of our tests measure the time of CPU-bound operation, mainly to avoid quadratic or worse behaviour. Add a helper to ignore GC and time spent in other processes.

encukou · 2024-10-08T09:17:00Z

@terryjreedy, do you see this on a machine where you could test a fix?

terryjreedy · 2024-10-10T03:47:21Z

I observed 2 failures in about 18 tries on installed 3.13.0 no-debug gil. I could hand-patch (copy-patch) a bit of test_int.

On fairly fresh local debug no-gil build I got 0 failures in 40 tries. Easy to patch from PR, but need something that fails to test fix.

vstinner · 2024-10-10T10:43:16Z

The problem here is that time.monotonic() has a resolution of 15.6 ms on Windows.

I changed time.monotonic() in Python 3.13 to use QueryPerformanceCounter(). It now has a resolution better than 1 microsecond.

chris-eibl · 2025-01-01T17:55:58Z

The problem here is that time.monotonic() has a resolution of 15.6 ms on Windows.

test_denial_of_service_prevented_str_to_int uses process_time (before and after Petr's PR #114912).

process_times claims a resolution of 1e-07 for Windows, but that's just the "units" which are used to report, see e.g. #82040 (comment).

Like @eryksun mentioned there, the "real" resolution is 15.625 ms by default, hence the 0.015625 in the traceback.

On my PC sw_convert.seconds is in the order of 20 milliseconds (wall time).
Here are example frequencies for process_times of sw_convert.seconds for 100 runs using collections.Counter:
Counter({0.015625: 67, 0.03125: 30, 0.0: 3})
which means, that in about two out of three cases sw_convert.seconds equals 0.015625, 3 times the test bails out using SkipTest('"slow" conversion') and 30% I end up with 0.03125.

It can return 0 or 15.6 ms for short timing, nothing in between.

Exactly this happens in very rare cases for sw_fail_huge.seconds or sw_fail_extra_huge.seconds even though they are only 2 to 4 microseconds wall time on my PC. Almost always this results in process_time=0, but sometimes you get process_time=15.625 ms and then the test fails.

If we fix the reported resolution of process_time, then we will have to increase the digits for Windows, because otherwise we would almost always skip the test.
Or calculate the needed digits at runtime instead of hard coding, like @time-one suggests in #118686.

Hm, this is using process_time. Should it use perf_counter instead?

This would fix Windows and other OSs having such poor resolution for process_times. All platforms should have "good enough" resolution in case of perf_counter:

[...] i.e. a clock with the highest available resolution to measure a short duration.

But then CPUStopwatch should be renamed to just Stopwatch or WalltimeStopwatch?

chris-eibl · 2025-03-15T17:23:26Z

Can I open a PR using time.monotonic in class CPUStopwatch? The few users of it IMHO all would be fine with wall time, since they do CPU intense stuff and then wall time should be process time, anyway.

Or at least for Windows, where the resolution is just too bad to use it?

chris-eibl · 2025-03-15T17:33:09Z

The only other user of CPUStopwatch I found is

cpython/Lib/test/test_re.py

Lines 2474 to 2485 in 7ae9c5d

    
           def test_search_anchor_at_beginning(self): 
        
               s = 'x'*10**7 
        
               with CPUStopwatch() as stopwatch: 
        
                   for p in r'\Ay', r'^y': 
        
                       self.assertIsNone(re.search(p, s)) 
        
                       self.assertEqual(re.split(p, s), [s]) 
        
                       self.assertEqual(re.findall(p, s), []) 
        
                       self.assertEqual(list(re.finditer(p, s)), []) 
        
                       self.assertEqual(re.sub(p, '', s), s) 
        
               # Without optimization it takes 1 second on my computer. 
        
               # With optimization -- 0.0003 seconds. 
        
               self.assertLess(stopwatch.seconds, 0.1)

where according to the comment time.monotonic would do its job, too.

chris-eibl · 2025-03-16T07:18:00Z

IMHO, wall time is a better fit, because https://docs.python.org/3/library/time.html#time.process_time

cpython/Lib/test/support/__init__.py

Lines 2587 to 2588 in 7ae9c5d

    
               N.B.: 
        
               - This *includes* time spent in other threads.

encukou · 2025-03-19T14:30:50Z

AFAIK the test suite doesn't have a threaded mode, so this should not matter much? It definitely would be better if it didn't include time spent in other threads, but, at least it doesn't include time spent in other processes.
On a loaded system (or if I run tests with -j30 on a 20-core machine, to fill in all the sleeps test_asyncio does), I'd expect wall-clock time to be less accurate.

time.perf_counter might be a better fit than time.monotonic, though in practice they're the same thing.

Can I open a PR using time.monotonic in class CPUStopwatch?

Go ahead. You could also rename it to Stopwatch. That way we can see the change on buildbots.

chris-eibl · 2025-03-19T15:15:10Z

Yeah, I fully agree: in theory, CPU time would be the better fit, it's just that on Windows the resolution is too poor.
And on "some platforms like WASM"

cpython/Lib/test/support/__init__.py

Lines 2602 to 2604 in 7ae9c5d

    
           if get_time() <= 0:  # some platforms like WASM lack process_time() 
        
               get_time = time.monotonic 
        
               clock_info = time.get_clock_info('monotonic')

we already use time.monotonic, i.e. not process_time.
So let's switch everywhere to time.monotonic - or perf_counter: the docs do not mention that it might not be available everywhere, but rather "a clock with the highest available resolution" :)

encukou · 2025-03-20T12:11:26Z

in theory, CPU time would be the better fit

That makes me think: should this continue using process_time on non-Windows platforms?

chris-eibl · 2025-03-20T15:56:40Z

IMHO it is better to use the same clock for all platforms. Otherwise, we might end up with an if for specific platforms, if process_time and perf_counter deviate. Let's try to go with a "one fits all"?

encukou · 2025-03-24T15:49:04Z

Sounds reasonable.
But, I'd make that the first thing to change if we start getting failures on Linux.

Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>

encukou · 2025-04-28T11:55:58Z

Thanks for the PR!
Merged; let's see if we start getting failures on other systems.

encukou · 2025-05-07T08:19:18Z

I just saw a failure on Windows:

======================================================================
FAIL: test_denial_of_service_prevented_str_to_int (test.test_int.IntStrDigitLimitsTests.test_denial_of_service_prevented_str_to_int)
Regression test: ensure we fail before performing O(N**2) work.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "b:\uildarea\3.x.ware-win11.nondebug\build\Lib\test\test_int.py", line 659, in test_denial_of_service_prevented_str_to_int
    self.assertLessEqual(sw_fail_extra_huge.seconds, sw_convert.seconds/2)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 0.0491769015789032 not less than or equal to 0.019058849662542343

----------------------------------------------------------------------
Ran 51 tests in 39.698s

FAILED (failures=1)

edit: and Linux

======================================================================
FAIL: test_denial_of_service_prevented_str_to_int (test.test_int.IntSubclassStrDigitLimitsTests.test_denial_of_service_prevented_str_to_int)
Regression test: ensure we fail before performing O(N**2) work.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-ppc64le.lto-pgo/build/Lib/test/test_int.py", line 659, in test_denial_of_service_prevented_str_to_int
    self.assertLessEqual(sw_fail_extra_huge.seconds, sw_convert.seconds/2)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 0.05159671300498303 not less than or equal to 0.02855902150258771

----------------------------------------------------------------------
Ran 51 tests in 91.714s

FAILED (failures=1)

chris-eibl · 2025-05-07T15:35:07Z

Yeah, unfortunately, it didn't help. Have seen some Windows bots failing, too, but you've also spotted a Linux one. If I am not mistaken, those never failed so far?

In isolation, I couldn't get it to fail anymore on Windows, but on loaded hosts process times are indeed better like you've already anticipated.

This test_denial_of_service_prevented_str_to_int is really a tough one :)

#125076 already dealt with it, too, and maybe we should go with the suggestion from @gpshead

if it is only being flaky on a specific platform feel free to add a platform specific skip. we only need this to run in CI on at least one tier1 platform. 0.015625 not less than or equal to 0.0078125 suggests the timer resolution is 0.0078125 and thus it's easy to be skewed by interruptions or aliasing around timer tick boundaries. potentially more common on faster machines.

I suggest to

revert back to process time, but what to do on wasi? Would we really like to fall back to something conceptually different?According to @gpshead, better skip it on platforms where we cannot reliably test it due to the lack of process time (or resolution thereof)?
fix the reported resolution of process_time on Windows, see details above. I can create an issue and PR. This will automatically skip the test on Windows: that's why I didn't want to do it so far, because I had some hopes we could cope with the wall time workaround
If we'd like to keep testing it on Windows, we can increase digits = 133700. After all, like @encukou mentioned, converting to str was optimized so much that it is now too fast wrt to the resolution of process_time on Windows.

encukou · 2025-05-08T08:05:04Z

better skip it on platforms where we cannot reliably test it

Yeah, it might be better to skip it on platforms that don't have a reliable high-resolution per-thread CPU time counter. Which is... all of them.

So, give up and remove the test?

I'm not aware of any other part of the codebase that's tested for accidental quadratic behaviour.

chris-eibl · 2025-06-09T16:06:51Z

I'm not aware of any other part of the codebase that's tested for accidental quadratic behaviour.

Just recently, in #134874 a test was requested.

There, just time.time() is used. Well, if the test lasts long enough, the resolution doesn't matter (but it is then unnecessarily CPU intense). Hopefully, this doesn't end in similar problems like test_denial_of_service_prevented_str_to_int.

Here, the implementation was optimized a lot but the initial test left untouched AFAICT, and so we dropped below the resolution on Windows.

We could increase

cpython/Lib/test/test_int.py

Line 629 in 28d91d0

digits = 133700

or just

So, give up and remove the test?

picnixz · 2025-06-09T16:16:42Z

I'm not aware of any other part of the codebase that's tested for accidental quadratic behaviour.

We have some tests, mainly for ReDoS:

cpython/Lib/test/test_http_cookiejar.py

Line 189 in 28d91d0

def test_iso2time_performance_regression(self):

cpython/Lib/test/test_http_cookiejar.py

Line 128 in 28d91d0

def test_http2time_redos_regression_actually_completes(self):

cpython/Lib/test/test_poplib.py

Line 356 in 28d91d0

def test_apop_REDOS(self):

cpython/Lib/test/test_difflib.py

Line 530 in 28d91d0

def test_is_line_junk_REDOS(self):

And once I find a way to fix the issues in email/_header_value_parser.py, I also plan to add some tests.

chris-eibl · 2025-06-09T16:29:16Z

And the interesting thing for all those tests above: they do not measure the time :)

picnixz · 2025-06-10T08:53:09Z

Yes, we just check that we're not spending hours because in general that's what it does otherwise. Unfortunately, I don't know if this would be the case for test_denial_of_service. I don't know if there's an input that can be done "fast" if the algorithm is correct but that would be "very slow" otherwise.

bedevere-app bot mentioned this issue Feb 2, 2024

gh-114911: Add CPUStopwatch test helper #114912

Merged

encukou added type-bug An unexpected behavior, bug, or error tests Tests in the Lib/test dir labels Feb 2, 2024

encukou added a commit that referenced this issue Feb 28, 2024

gh-114911: Add CPUStopwatch test helper (GH-114912)

7acf1fb

A few of our tests measure the time of CPU-bound operation, mainly to avoid quadratic or worse behaviour. Add a helper to ignore GC and time spent in other processes.

encukou mentioned this issue Oct 8, 2024

Occasional failure in installed 3.13.0 test_int #125076

Closed

skirpichev mentioned this issue Oct 13, 2024

Sporadic failures in test_int test_denial_of_service_prevented_str_to_int #118686

Closed

vstinner changed the title ~~test_int.test_denial_of_service failed: took 15ms~~ [Windows] test_int.test_denial_of_service failed: took 15ms Jan 9, 2025

chris-eibl mentioned this issue Mar 15, 2025

GH-131288: use _AddressOfReturnAddress in _Py_get_machine_stack_pointer to silence MSVC warning in pycore_ceval.h for release builds #131289

Merged

bedevere-app bot mentioned this issue Mar 19, 2025

GH-114911: use time.perf_counter in Stopwatch #131469

Merged

encukou added a commit that referenced this issue Apr 28, 2025

GH-114911: use time.perf_counter in Stopwatch (GH-131469)

11f457c

Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>

chris-eibl mentioned this issue Jun 9, 2025

Reported resolution of time.process_time and time.thread_time is wrong on Windows #135304

Open

Uh oh!

[Windows] test_int.test_denial_of_service failed: took 15ms #114911

[Windows] test_int.test_denial_of_service failed: took 15ms #114911

Comments

encukou commented Feb 2, 2024 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Linked PRs

vstinner commented Feb 12, 2024

Uh oh!

vstinner commented Feb 12, 2024

Uh oh!

encukou commented Feb 12, 2024

Uh oh!

sobolevn commented Feb 16, 2024

Uh oh!

encukou commented Oct 8, 2024

Uh oh!

terryjreedy commented Oct 10, 2024

Uh oh!

vstinner commented Oct 10, 2024

Uh oh!

chris-eibl commented Jan 1, 2025

Uh oh!

chris-eibl commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chris-eibl commented Mar 15, 2025

Uh oh!

chris-eibl commented Mar 16, 2025

Uh oh!

encukou commented Mar 19, 2025

Uh oh!

chris-eibl commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

encukou commented Mar 20, 2025

Uh oh!

chris-eibl commented Mar 20, 2025

Uh oh!

encukou commented Mar 24, 2025

Uh oh!

encukou commented Apr 28, 2025

Uh oh!

encukou commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chris-eibl commented May 7, 2025

Uh oh!

encukou commented May 8, 2025

Uh oh!

chris-eibl commented Jun 9, 2025

Uh oh!

picnixz commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chris-eibl commented Jun 9, 2025

Uh oh!

picnixz commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

encukou commented Feb 2, 2024 •

edited by bedevere-app bot

Loading

chris-eibl commented Mar 15, 2025 •

edited

Loading

chris-eibl commented Mar 19, 2025 •

edited

Loading

encukou commented May 7, 2025 •

edited

Loading

picnixz commented Jun 9, 2025 •

edited

Loading

picnixz commented Jun 10, 2025 •

edited

Loading