Skip to content

bpo-36977: Make SharedMemoryManager release its resources if its parent process dies #13451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions Lib/multiprocessing/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,11 @@ def __init__(self, registry, address, authkey, serializer):
self.id_to_local_proxy_obj = {}
self.mutex = threading.Lock()

def _track_parent(self):
if process.parent_process() is not None:
process.parent_process().join()
self.stop_event.set()

def serve_forever(self):
'''
Run the server forever
Expand All @@ -169,6 +174,12 @@ def serve_forever(self):
accepter = threading.Thread(target=self.accepter)
accepter.daemon = True
accepter.start()

if process.parent_process() is not None:
watcher = threading.Thread(target=self._track_parent)
watcher.daemon = True
watcher.start()

try:
while not self.stop_event.is_set():
self.stop_event.wait(1)
Expand Down
39 changes: 39 additions & 0 deletions Lib/test/_test_multiprocessing.py
Original file line number Diff line number Diff line change
Expand Up @@ -3824,6 +3824,45 @@ def test_shared_memory_SharedMemoryServer_ignores_sigint(self):

smm.shutdown()

def test_shared_memory_SharedMemoryManager_cleanup_if_parent_dies(self):
# If a process that spawned a SharedMemoryManager was terminated, its
# manager process should notice it, unlink the resources the manager
# delivered, and shut down.
cmd = '''if 1:
import sys, time

from multiprocessing.managers import SharedMemoryManager


smm = SharedMemoryManager()
smm.start()
sl = smm.ShareableList([1, 2, 3])

sys.stdout.write(f'{sl.shm.name}\\n')
sys.stdout.flush()
time.sleep(100)
'''

p = subprocess.Popen([sys.executable, '-E', '-c', cmd],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
shm_name = p.stdout.readline().decode().strip()
p.terminate()
start_time = time.monotonic()
deadline = start_time + 60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 seconds sounds enough, no?

Copy link
Contributor

@ogrisel ogrisel May 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of overloaded CI workers one can see weird things on rare occasions.

99.9% of the time it will probably take less than 10ms (although I have not checked), this limit is just a timeout, so I think it's fine to keep a large value to make it extremely unlikely to get false positive failures on CI servers.

Copy link
Member

@pitrou pitrou May 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are seconds here not milliseconds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, no strong feelings either.

t = 0.1
while time.monotonic() < deadline:
time.sleep(t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel comfortable having tests that rely on sleep, they are almost assured to fail in some of the buildbots
(check #10700 for example). Could you restructure this
tests to use synchronization primitives?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your worries. I am having trouble thinking of a test that does not require a timeout though (else, we risk hanging forever waiting for the synchronization event to occur). Inherently, this tests a feature that relies on connection.wait, which can block for a non-deterministic amount of time.

Note that using this pattern:

while time.monotonic() < deadline:
    time.sleep(t)
    if condition():
        break
else:
    raise AssertionError

is already more robust than a simple

time.sleep(dt)
self.assertTrue(condition())

It was discussed here: https://bugs.python.org/issue36867, and @vstinner seemed OK with its usage. I may be missing something though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiprocessing remains the module with the higher amount of random failures due to sleeps/race conditions. I will dismiss my review in case there is no other way, but I still uneasy about adding more sleep() :(

# The shared_memory segment should be released by the
# SharedMemoryManager server process before it shuts down.
try:
sl = shared_memory.SharedMemory(shm_name, create=False)
except FileNotFoundError:
break
else:
raise AssertionError(
"A SharedMemoryManager prevented some resources to be cleaned "
"up after its parent process was terminated")

@unittest.skipIf(os.name != "posix", "resource_tracker is posix only")
def test_shared_memory_SharedMemoryManager_reuses_resource_tracker(self):
# bpo-36867: test that a SharedMemoryManager uses the
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
SharedMemoryManager server processes do not block anymore the release of shared memory segments if their parent process was unexpectedly terminated.