Skip to content

gh-123471: Make concurrent iteration over itertools.cycle safe under free-threading #131212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 2, 2025

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Mar 13, 2025

See #124397

  • In the FT build we cannot clear the iterator cycle->it. Instead we use cycle->index as a check whether the iterator has been exhausted or not.
  • We remove dead code related to cycle->firstpass
  • The unit test is combined with itertools.batched in a single file.

@rhettinger
Copy link
Contributor

In the "Strategy for Free Threading" umbrella issue, principle 3 is:

Other iterators implemented in C will get only the minimal changes necessary to cause them to not crash in a free-threaded build. The edits should be made in a way that does not impact existing semantics or performance (i.e. do not damage the standard GIL build). Concurrent access is allowed to return duplicate values, skip values, or raise an exception.

Given that plan, can you please articulate reason for this PR. Is there currently a risk of a crash?

Also what is specifically is meant by "safe under free-threading"? AFAICT there is no concurrent access use case for cycle() that wouldn't be met by the planned options to either wrap with serialize() or tee() depending on whether the user wants atomic/consecutive semantics or parallel/independent semantics. That is what we would do for an equivalent generator.

The itertools code was highly refined and micro-optimized with painful tooling (valgrind, vtune, etc), so I'm a bit reluctant to change long stable code unless it is really necessary. Also, I was waiting to see how the essential builtin functions/classes evolved so that itertools could follow a proven model rather than being on the bleeding edge where misadventures might persist for a long time before being noticed.

@eendebakpt
Copy link
Contributor Author

The current implementation of itertools.cycle can crash under the free-threading build for two reasons:

  • Concurrent iteration can result in lz->index being incremented by a thread to a value of PyList_GET_SIZE(lz->saved)while in another thread the non thread-safePyList_GET_ITEM` is used.

item = PyList_GET_ITEM(lz->saved, lz->index);
lz->index++;
if (lz->index >= PyList_GET_SIZE(lz->saved))
lz->index = 0;

  • One thread can exhaust the iterator lz->it and clear it

Py_CLEAR(lz->it);
}

while another thread is still using lz->it (since the cycleobject might be holding the only reference to the iterator this is not safe). The test added in this PR will often crash the interpreter in the free-threading build on my system (increasing the number_of_iterations will increase the odds).

By "safe under free-threading" I mean the free-threading build will not crash. No guarantees are given about any "correctness" of the results (I can adapt the title and news entry if desired). It is true that using the planned serialize would be a good solution for someone planning to use the cycle with concurrent iteration. But I have not doubt some users will be using the itertools.cycle in a free-threading setting (perhaps even being unaware of this via 3rd party packages). I believe we should make the cpython free-threading safe against race conditions leading to crashes like this. These kind of bugs can be difficult to debug, and could potentially only trigger very rarely.

@rhettinger
Copy link
Contributor

rhettinger commented Mar 17, 2025

Okay, this sounds reasonable :-) Thanks for explaining the purpose of the edit.

Please do check that performance hasn't been negatively impacted (if so, the make an ifdef so that the baseline isn't impacted; if not, then we're good).

@eendebakpt
Copy link
Contributor Author

@rhettinger In the benchmarks I did this PR is faster for the normal build. There are two reasons for this

  • The unused variable firstpass is removed, making the cycleobject smaller in memory and avoiding the initialiation of the variable in the constructor. (note: this change is not related to FT)
  • Once the iterator is exhausted, we only access lz->index and lz->saved. In current main in addition the lz->it is accessed.

The gain from this PR is small though (and I would not be surprised if for different platforms or benchmark tests the gain is zero).

Here are the results of one of the benchmarks I did:

Benchmark code
import time
import gc
from itertools import cycle

t0=time.perf_counter()
for ii in range(100):
    c  = cycle( (1, 2, 3, 4))
    
    for _ in range(200):
        next(c)

gc.collect() # make sure that in both the normal and free-threading build we clean up the constructed objects
dt=time.perf_counter()-t0
print(dt)
Script to generate and plot the data
""" Interleaved benchmark for itertools.cycle

@author: eendebakpt
"""

import subprocess

import matplotlib.pyplot as plt
import numpy as np

test_script = '/home/eendebakpt/python_testing/benchmarks/bm_itertools_cycle.py'
cmds = ["/home/eendebakpt/cpython0/python {test_script", "/home/eendebakpt/cpython/python {test_script}" ]

verbose = False
tt = []
for ii in range(600):
    print(f"run {ii}")
    for cmd in cmds:
        p = subprocess.run(cmd, shell=True, check=True, capture_output=True, encoding="utf-8")
        if verbose:
            print(f"Command {p.args} exited with {p.returncode} code, output: \n{p.stdout}")
        tt.append(float(p.stdout))

tt_main = tt[::2]
tt_pr = tt[1::2]

# %% Show results
plt.figure(10)
plt.clf()
plt.plot(tt_main[::2], ".", label="Main")
plt.plot(tt_pr[1::2], ".", label="PR")
plt.axhline(np.mean(tt_main), color="C0", label="mean for main")
plt.axhline(np.mean(tt_pr), color="C1", label="mean for PR")
plt.ylabel("Execution time [s]")
plt.legend()

gain = np.mean(tt_main) / np.mean(tt_pr)
plt.title(f"Performance gain: {gain:.3f}")

image

@kumaraditya303 kumaraditya303 self-requested a review May 28, 2025 15:31
@kumaraditya303 kumaraditya303 merged commit 26a1cd4 into python:main Jun 2, 2025
43 checks passed
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot ARM Raspbian 3.x (tier-3) has failed when building commit 26a1cd4.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/424/builds/10920) and take a look at the build logs.
  4. Check if the failure is related to this commit (26a1cd4) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/424/builds/10920

Failed tests:

  • test.test_concurrent_futures.test_interpreter_pool

Failed subtests:

  • test_create_connection_local_addr_nomatch_family - test.test_asyncio.test_events.PollEventLoopTests.test_create_connection_local_addr_nomatch_family
  • test_map_buffersize_on_infinite_iterable - test.test_concurrent_futures.test_interpreter_pool.InterpreterPoolExecutorTest.test_map_buffersize_on_infinite_iterable
  • test_map_buffersize_on_multiple_infinite_iterables - test.test_concurrent_futures.test_interpreter_pool.InterpreterPoolExecutorTest.test_map_buffersize_on_multiple_infinite_iterables

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/test/test_concurrent_futures/executor.py", line 138, in test_map_buffersize_on_multiple_infinite_iterables
    self.assertEqual(next(res, None), 0)
                     ~~~~^^^^^^^^^^^
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/concurrent/futures/_base.py", line 666, in result_iterator
    yield _result_or_cancel(fs.pop())
          ~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/concurrent/futures/_base.py", line 311, in _result_or_cancel
    return fut.result(timeout)
           ~~~~~~~~~~^^^^^^^^^
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/concurrent/futures/_base.py", line 450, in result
    return self.__get_result()
           ~~~~~~~~~~~~~~~~~^^
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/concurrent/futures/_base.py", line 395, in __get_result
    raise self._exception
concurrent.futures.interpreter.BrokenInterpreterPool: A thread initializer failed, the thread pool is not usable anymore


Traceback (most recent call last):
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/test/test_asyncio/test_events.py", line 832, in test_create_connection_local_addr_nomatch_family
    with self.assertRaises(OSError):
         ~~~~~~~~~~~~~~~~~^^^^^^^^^
AssertionError: OSError not raised


Traceback (most recent call last):
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/concurrent/futures/thread.py", line 99, in _worker
    ctx.initialize()
    ~~~~~~~~~~~~~~^^
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/concurrent/futures/interpreter.py", line 115, in initialize
    self.interpid = _interpreters.create(reqrefs=True)
                    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
interpreters.InterpreterError: interpreter creation failed


Traceback (most recent call last):
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/test/support/__init__.py", line 829, in gc_collect
    gc.collect()
ResourceWarning: unclosed <socket.socket fd=5, family=10, type=1, proto=6, laddr=('::1', 35907, 0, 0), raddr=('::1', 35907, 0, 0)>
Warning -- Unraisable exception
Exception ignored while calling deallocator <function _SelectorTransport.__del__ at 0xf6524fa0>:
Traceback (most recent call last):
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/asyncio/selector_events.py", line 873, in __del__
    _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: unclosed transport <_SelectorSocketTransport fd=5>


Traceback (most recent call last):
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/test/test_concurrent_futures/executor.py", line 127, in test_map_buffersize_on_infinite_iterable
    self.assertEqual(next(res, None), "0")
                     ~~~~^^^^^^^^^^^
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/concurrent/futures/_base.py", line 666, in result_iterator
    yield _result_or_cancel(fs.pop())
          ~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/concurrent/futures/_base.py", line 311, in _result_or_cancel
    return fut.result(timeout)
           ~~~~~~~~~~^^^^^^^^^
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/concurrent/futures/_base.py", line 450, in result
    return self.__get_result()
           ~~~~~~~~~~~~~~~~~^^
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/concurrent/futures/_base.py", line 395, in __get_result
    raise self._exception
concurrent.futures.interpreter.BrokenInterpreterPool: A thread initializer failed, the thread pool is not usable anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants