Skip to content

multiprocessing forkserver main contains dead main_path= handling code #126631

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gpshead opened this issue Nov 9, 2024 · 2 comments
Open

multiprocessing forkserver main contains dead main_path= handling code #126631

gpshead opened this issue Nov 9, 2024 · 2 comments
Labels
3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes stdlib Python modules in the Lib dir topic-multiprocessing type-bug An unexpected behavior, bug, or error

Comments

@gpshead
Copy link
Member

gpshead commented Nov 9, 2024

Bug report

Bug description:

The multiprocessing module's forkserver start method multiprocessing.forkserver.main has two arguments that are values passed from the parent process to the forkserver when first launching it. main_path= and sys_path=. Via code inspection while fixing and testing #126538:

There is no possible way for main_path= to be set on the multiprocessing's forkserver.main() call since 2013's 9a76735 for #64145 as shipped in 3.4 (which also introduced the forkserver feature).

the forkserver.main(...) call is constructed in Lib/multiprocessing/forkserver.py getting its two possible allowed named args from spawn.get_preparation_data() in Lib/multiprocessing/spawn.py which was changed to set "init_main_from_name" in the kwargs dict it returns instead of "main_path" in the above change. Effectively making the main_path= handling code dead in forkserver.main.

We should either remove the dead code, or determine if it was intended to support something that has been silently broken when using the forkserver start method for the past 11 years, and if so, add an explicit test for that and adapt it to use "init_main_from_name" as the spawn code does.

CPython versions tested on:

3.12, 3.13, 3.14, CPython main branch

Operating systems tested on:

Linux

Linked PRs

@gpshead gpshead added type-bug An unexpected behavior, bug, or error 3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes labels Nov 9, 2024
@duaneg
Copy link
Contributor

duaneg commented Jun 9, 2025

It did silently break something: __main__ can no longer be pre-loaded.

import multiprocessing
import os

print(f"importing {__name__} ({__file__}) pid={os.getpid()}")

def runner():
    print(f"running in {os.getpid()}")

if __name__ == '__main__':
    multiprocessing.set_start_method('forkserver')
    multiprocessing.set_forkserver_preload(['__main__'])

    N=4
    procs = [multiprocessing.Process(target=runner) for _ in range(N)]
    for p in procs:
        p.start()
    for p in procs:
        p.join()

Currently:

importing __main__ (/home/duaneg/src/cpython/data/126631/repro.py) pid=946571
importing __mp_main__ (/home/duaneg/src/cpython/data/126631/repro.py) pid=946574
running in 946574
importing __mp_main__ (/home/duaneg/src/cpython/data/126631/repro.py) pid=946575
running in 946575
importing __mp_main__ (/home/duaneg/src/cpython/data/126631/repro.py) pid=946576
running in 946576
importing __mp_main__ (/home/duaneg/src/cpython/data/126631/repro.py) pid=946577
running in 946577

If we fix it (and presumably before 3.4, although I haven't tested):

importing __main__ (/home/duaneg/src/cpython/data/126631/repro.py) pid=946458
importing __mp_main__ (/home/duaneg/src/cpython/data/126631/repro.py) pid=946460
running in 946461
running in 946462
running in 946463
running in 946464

Having said that, no-one noticing for eleven years suggests it isn't actually required. Perhaps it would be better to simplify things and remove the code? I'll post a quick and dirty fix, for reference.

duaneg added a commit to duaneg/cpython that referenced this issue Jun 9, 2025
The `main_path` parameter was renamed `init_main_from_name`, update the
forkserver code accordingly.
@duaneg
Copy link
Contributor

duaneg commented Jun 10, 2025

FWIW one place where __main__ preloading is explicitly used "to speed up tests" are the fork server tests themselves. The fixed version does indeed use a little less CPU (a quick & crude check shows ~2% improvement), however the actual elapsed time is not significantly affected as these tests spend most of their time waiting and sleeping.

In fact, the fork server's default preload list is ['__main__'], so this bug affects all uses of fork server that don't explicitly set preload to not include __main__. Given that, I am more inclined to think we should fix this: CPU/energy savings may be small for each individual use, but will quickly add up across all fork server use globally. However, it also raises the risk, as it will affect all multiprocessing users who don't override the default settings.

In trying to reshape the repro script into a regression test, I ran into some truly bizarre behaviour: it was working when run directly but produced spurious output when run as a unit test, even with the exact same command line and environment! It turns out the fork server is not flushing stdout before forking, so if any of the modules it preloads writes to a buffered stdout/stderr but doesn't flush, it will be inherited by the child process(es). This is partially masked by this bug, since it is most likely to be an issue for the __main__ module. I'll open a separate issue for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes stdlib Python modules in the Lib dir topic-multiprocessing type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

3 participants