Skip to content

[Windows] test_int.test_denial_of_service failed: took 15ms #114911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
encukou opened this issue Feb 2, 2024 · 24 comments
Open

[Windows] test_int.test_denial_of_service failed: took 15ms #114911

encukou opened this issue Feb 2, 2024 · 24 comments
Labels
tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error

Comments

@encukou
Copy link
Member

encukou commented Feb 2, 2024

test_int recently failed on a buildbot:

======================================================================
FAIL: test_denial_of_service_prevented_str_to_int (test.test_int.IntSubclassStrDigitLimitsTests.test_denial_of_service_prevented_str_to_int)
Regression test: ensure we fail before performing O(N**2) work.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "b:\uildarea\3.12.ware-win11.nondebug\build\Lib\test\test_int.py", line 743, in test_denial_of_service_prevented_str_to_int
    self.assertLessEqual(seconds_to_fail_extra_huge, seconds_to_convert/2)
AssertionError: 0.015625 not less than or equal to 0.0078125

I don't know for sure, but my guess is that a GC collection got triggered at the wrong time.

I'd like to add a test helper for timing CPU-bound tasks. It should disable GC and use process_time; perhaps it can do other tricks in the future too.

Linked PRs

encukou added a commit to encukou/cpython that referenced this issue Feb 2, 2024
A few of our tests measure the time of CPU-bound operation, mainly
to avoid quadratic or worse behaviour.
Add a helper to ignore GC and time spent in other processes.
@encukou encukou added type-bug An unexpected behavior, bug, or error tests Tests in the Lib/test dir labels Feb 2, 2024
@vstinner
Copy link
Member

AssertionError: 0.015625 not less than or equal to 0.0078125

The problem here is that time.monotonic() has a resolution of 15.6 ms on Windows. It can return 0 or 15.6 ms for short timing, nothing in between. I added "CLOCK_RES" to some tests:

$ grep 'CLOCK_RES =' Lib/test/ -R
Lib/test/test_asyncio/utils.py:CLOCK_RES = 0.050
Lib/test/_test_eintr.py:CLOCK_RES = 0.020
Lib/test/_test_multiprocessing.py:CLOCK_RES = 0.100
Lib/test/test_os.py:    CLOCK_RES = CLOCK_RES_NS * 1e-9

@vstinner
Copy link
Member

The problem here is that time.monotonic() has a resolution of 15.6 ms on Windows

I don't know if it's the root issue or not. I will try to reproduce to debug the issue on Windows.

@encukou
Copy link
Member Author

encukou commented Feb 12, 2024

Hm, this is usingprocess_time. Should it use perf_counter instead?

@sobolevn
Copy link
Member

Happened again: #115504 (comment)

encukou added a commit that referenced this issue Feb 28, 2024
A few of our tests measure the time of CPU-bound operation, mainly
to avoid quadratic or worse behaviour.
Add a helper to ignore GC and time spent in other processes.
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024
A few of our tests measure the time of CPU-bound operation, mainly
to avoid quadratic or worse behaviour.
Add a helper to ignore GC and time spent in other processes.
adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024
A few of our tests measure the time of CPU-bound operation, mainly
to avoid quadratic or worse behaviour.
Add a helper to ignore GC and time spent in other processes.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
A few of our tests measure the time of CPU-bound operation, mainly
to avoid quadratic or worse behaviour.
Add a helper to ignore GC and time spent in other processes.
@encukou
Copy link
Member Author

encukou commented Oct 8, 2024

@terryjreedy, do you see this on a machine where you could test a fix?

@terryjreedy
Copy link
Member

I observed 2 failures in about 18 tries on installed 3.13.0 no-debug gil. I could hand-patch (copy-patch) a bit of test_int.

On fairly fresh local debug no-gil build I got 0 failures in 40 tries. Easy to patch from PR, but need something that fails to test fix.

@vstinner
Copy link
Member

The problem here is that time.monotonic() has a resolution of 15.6 ms on Windows.

I changed time.monotonic() in Python 3.13 to use QueryPerformanceCounter(). It now has a resolution better than 1 microsecond.

@chris-eibl
Copy link
Member

The problem here is that time.monotonic() has a resolution of 15.6 ms on Windows.

test_denial_of_service_prevented_str_to_int uses process_time (before and after Petr's PR #114912).

process_times claims a resolution of 1e-07 for Windows, but that's just the "units" which are used to report, see e.g. #82040 (comment).

Like @eryksun mentioned there, the "real" resolution is 15.625 ms by default, hence the 0.015625 in the traceback.

On my PC sw_convert.seconds is in the order of 20 milliseconds (wall time).
Here are example frequencies for process_times of sw_convert.seconds for 100 runs using collections.Counter:
Counter({0.015625: 67, 0.03125: 30, 0.0: 3})
which means, that in about two out of three cases sw_convert.seconds equals 0.015625, 3 times the test bails out using SkipTest('"slow" conversion') and 30% I end up with 0.03125.

It can return 0 or 15.6 ms for short timing, nothing in between.

Exactly this happens in very rare cases for sw_fail_huge.seconds or sw_fail_extra_huge.seconds even though they are only 2 to 4 microseconds wall time on my PC. Almost always this results in process_time=0, but sometimes you get process_time=15.625 ms and then the test fails.

If we fix the reported resolution of process_time, then we will have to increase the digits for Windows, because otherwise we would almost always skip the test.
Or calculate the needed digits at runtime instead of hard coding, like @time-one suggests in #118686.

Hm, this is using process_time. Should it use perf_counter instead?

This would fix Windows and other OSs having such poor resolution for process_times. All platforms should have "good enough" resolution in case of perf_counter:

[...] i.e. a clock with the highest available resolution to measure a short duration.

But then CPUStopwatch should be renamed to just Stopwatch or WalltimeStopwatch?

@vstinner vstinner changed the title test_int.test_denial_of_service failed: took 15ms [Windows] test_int.test_denial_of_service failed: took 15ms Jan 9, 2025
@chris-eibl
Copy link
Member

chris-eibl commented Mar 15, 2025

Can I open a PR using time.monotonic in class CPUStopwatch? The few users of it IMHO all would be fine with wall time, since they do CPU intense stuff and then wall time should be process time, anyway.

Or at least for Windows, where the resolution is just too bad to use it?

@chris-eibl
Copy link
Member

The only other user of CPUStopwatch I found is

cpython/Lib/test/test_re.py

Lines 2474 to 2485 in 7ae9c5d

def test_search_anchor_at_beginning(self):
s = 'x'*10**7
with CPUStopwatch() as stopwatch:
for p in r'\Ay', r'^y':
self.assertIsNone(re.search(p, s))
self.assertEqual(re.split(p, s), [s])
self.assertEqual(re.findall(p, s), [])
self.assertEqual(list(re.finditer(p, s)), [])
self.assertEqual(re.sub(p, '', s), s)
# Without optimization it takes 1 second on my computer.
# With optimization -- 0.0003 seconds.
self.assertLess(stopwatch.seconds, 0.1)

where according to the comment time.monotonic would do its job, too.

@chris-eibl
Copy link
Member

IMHO, wall time is a better fit, because https://docs.python.org/3/library/time.html#time.process_time

N.B.:
- This *includes* time spent in other threads.

@encukou
Copy link
Member Author

encukou commented Mar 19, 2025

AFAIK the test suite doesn't have a threaded mode, so this should not matter much? It definitely would be better if it didn't include time spent in other threads, but, at least it doesn't include time spent in other processes.
On a loaded system (or if I run tests with -j30 on a 20-core machine, to fill in all the sleeps test_asyncio does), I'd expect wall-clock time to be less accurate.

time.perf_counter might be a better fit than time.monotonic, though in practice they're the same thing.

Can I open a PR using time.monotonic in class CPUStopwatch?

Go ahead. You could also rename it to Stopwatch. That way we can see the change on buildbots.

@chris-eibl
Copy link
Member

chris-eibl commented Mar 19, 2025

Yeah, I fully agree: in theory, CPU time would be the better fit, it's just that on Windows the resolution is too poor.
And on "some platforms like WASM"

if get_time() <= 0: # some platforms like WASM lack process_time()
get_time = time.monotonic
clock_info = time.get_clock_info('monotonic')

we already use time.monotonic, i.e. not process_time.
So let's switch everywhere to time.monotonic - or perf_counter: the docs do not mention that it might not be available everywhere, but rather "a clock with the highest available resolution" :)

@encukou
Copy link
Member Author

encukou commented Mar 20, 2025

in theory, CPU time would be the better fit

That makes me think: should this continue using process_time on non-Windows platforms?

@chris-eibl
Copy link
Member

IMHO it is better to use the same clock for all platforms. Otherwise, we might end up with an if for specific platforms, if process_time and perf_counter deviate. Let's try to go with a "one fits all"?

@encukou
Copy link
Member Author

encukou commented Mar 24, 2025

Sounds reasonable.
But, I'd make that the first thing to change if we start getting failures on Linux.

encukou added a commit that referenced this issue Apr 28, 2025
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
@encukou
Copy link
Member Author

encukou commented Apr 28, 2025

Thanks for the PR!
Merged; let's see if we start getting failures on other systems.

@encukou
Copy link
Member Author

encukou commented May 7, 2025

I just saw a failure on Windows:

======================================================================
FAIL: test_denial_of_service_prevented_str_to_int (test.test_int.IntStrDigitLimitsTests.test_denial_of_service_prevented_str_to_int)
Regression test: ensure we fail before performing O(N**2) work.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "b:\uildarea\3.x.ware-win11.nondebug\build\Lib\test\test_int.py", line 659, in test_denial_of_service_prevented_str_to_int
    self.assertLessEqual(sw_fail_extra_huge.seconds, sw_convert.seconds/2)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 0.0491769015789032 not less than or equal to 0.019058849662542343

----------------------------------------------------------------------
Ran 51 tests in 39.698s

FAILED (failures=1)

edit: and Linux

======================================================================
FAIL: test_denial_of_service_prevented_str_to_int (test.test_int.IntSubclassStrDigitLimitsTests.test_denial_of_service_prevented_str_to_int)
Regression test: ensure we fail before performing O(N**2) work.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-ppc64le.lto-pgo/build/Lib/test/test_int.py", line 659, in test_denial_of_service_prevented_str_to_int
    self.assertLessEqual(sw_fail_extra_huge.seconds, sw_convert.seconds/2)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 0.05159671300498303 not less than or equal to 0.02855902150258771

----------------------------------------------------------------------
Ran 51 tests in 91.714s

FAILED (failures=1)

@chris-eibl
Copy link
Member

Yeah, unfortunately, it didn't help. Have seen some Windows bots failing, too, but you've also spotted a Linux one. If I am not mistaken, those never failed so far?

In isolation, I couldn't get it to fail anymore on Windows, but on loaded hosts process times are indeed better like you've already anticipated.

This test_denial_of_service_prevented_str_to_int is really a tough one :)

#125076 already dealt with it, too, and maybe we should go with the suggestion from @gpshead

if it is only being flaky on a specific platform feel free to add a platform specific skip. we only need this to run in CI on at least one tier1 platform. 0.015625 not less than or equal to 0.0078125 suggests the timer resolution is 0.0078125 and thus it's easy to be skewed by interruptions or aliasing around timer tick boundaries. potentially more common on faster machines.

I suggest to

  1. revert back to process time, but what to do on wasi? Would we really like to fall back to something conceptually different?According to @gpshead, better skip it on platforms where we cannot reliably test it due to the lack of process time (or resolution thereof)?

  2. fix the reported resolution of process_time on Windows, see details above. I can create an issue and PR. This will automatically skip the test on Windows: that's why I didn't want to do it so far, because I had some hopes we could cope with the wall time workaround

  3. If we'd like to keep testing it on Windows, we can increase digits = 133700. After all, like @encukou mentioned, converting to str was optimized so much that it is now too fast wrt to the resolution of process_time on Windows.

@encukou
Copy link
Member Author

encukou commented May 8, 2025

better skip it on platforms where we cannot reliably test it

Yeah, it might be better to skip it on platforms that don't have a reliable high-resolution per-thread CPU time counter. Which is... all of them.

So, give up and remove the test?

I'm not aware of any other part of the codebase that's tested for accidental quadratic behaviour.

@chris-eibl
Copy link
Member

I'm not aware of any other part of the codebase that's tested for accidental quadratic behaviour.

Just recently, in #134874 a test was requested.

There, just time.time() is used. Well, if the test lasts long enough, the resolution doesn't matter (but it is then unnecessarily CPU intense). Hopefully, this doesn't end in similar problems like test_denial_of_service_prevented_str_to_int.

Here, the implementation was optimized a lot but the initial test left untouched AFAICT, and so we dropped below the resolution on Windows.

We could increase

digits = 133700

or just

So, give up and remove the test?

@picnixz
Copy link
Member

picnixz commented Jun 9, 2025

I'm not aware of any other part of the codebase that's tested for accidental quadratic behaviour.

We have some tests, mainly for ReDoS:

def test_iso2time_performance_regression(self):

def test_http2time_redos_regression_actually_completes(self):

def test_apop_REDOS(self):

def test_is_line_junk_REDOS(self):

And once I find a way to fix the issues in email/_header_value_parser.py, I also plan to add some tests.

@chris-eibl
Copy link
Member

And the interesting thing for all those tests above: they do not measure the time :)

@picnixz
Copy link
Member

picnixz commented Jun 10, 2025

Yes, we just check that we're not spending hours because in general that's what it does otherwise. Unfortunately, I don't know if this would be the case for test_denial_of_service. I don't know if there's an input that can be done "fast" if the algorithm is correct but that would be "very slow" otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

6 participants