Skip to content

bpo-46070: _PyGC_Fini() untracks objects #30577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 13, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:c:func:`Py_EndInterpreter` now explicitly untracks all objects currently
tracked by the GC. Previously, if an object was used later by another
interpreter, calling :c:func:`PyObject_GC_UnTrack` on the object crashed if the
previous or the next object of the :c:type:`PyGC_Head` structure became a
dangling pointer. Patch by Victor Stinner.
24 changes: 24 additions & 0 deletions Modules/gcmodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -2161,12 +2161,36 @@ _PyGC_DumpShutdownStats(PyInterpreterState *interp)
}
}


static void
gc_fini_untrack(PyGC_Head *list)
{
PyGC_Head *gc;
for (gc = GC_NEXT(list); gc != list; gc = GC_NEXT(list)) {
PyObject *op = FROM_GC(gc);
_PyObject_GC_UNTRACK(op);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one or more objects "owned" by another interpreter (AKA OTHER) could still hold a reference to one of these untracked objects (AKA UNTRACKED). The bug we're fixing demonstrates that's a possibility.

Could it make it harder to break cycles? Could it lead to memory leaks?

How will the UNTRACKED object impact GC for any OTHER object holding a reference to it?

What happens if one of these UNTRACKED objects is involved in a cycle with one or more OTHER objects? Can we still clean them all up?

What happens if several of these UNTRACKED objects are involved in a cycle, but one or more OTHER objects holds a reference to one of them? Can we still clean them all up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an object is not tracked by the GC and is part of a ref cycle, the GC is unable to break the reference cycle, and so yes, memory leaks. Previously, Python was crashing. Well, the situation is less bad :-)

This change doesn't introduce the memory leak. An object cannot be tracked in two GC lists at the same time: _PyObject_GC_TRACK() has an assertion for that. The leak was already there.

If an object is created in interpreter 1, it's tracked by the GC of the interpreter 1. If it's copied to the interpreter 2 and the interpreter 1 is destroyed, the interpreter 2 GC is not going to automatically tracks the object. Moreover, the interpreter 1 cannot guess if another interpreter is using the object or not.

IMO untracking all objects is the least bad solution.

IMO the only way to ensure that no memory is leaked is to prevent sharing objects between interpreters, and rely on existing mechanisms (GC and finalizers) to release memory. So continue to convert static types to heap types, continue to update C extensions to the multi-phase initialization, continue moving globals into module state and per-interpreter structures, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so yes, memory leaks. Previously, Python was crashing. Well, the situation is less bad :-)

🙂

Copy link
Member Author

@vstinner vstinner Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python 3.8, when the GC state was shared, it seems like any interpreter could run a GC collection. It couldn't happen in parallel, thanks to the "collecting" flag. The GC is not re-entrant and simply does nothing (exit) if it's already collecting.

I cannot say if Python 3.8 was able to break the reference cycles that Python 3.9 and newer can no longer break: when an object in created in an interpreter and then "migrates" to another interpreter.

}
}


void
_PyGC_Fini(PyInterpreterState *interp)
{
GCState *gcstate = &interp->gc;
Py_CLEAR(gcstate->garbage);
Py_CLEAR(gcstate->callbacks);

if (!_Py_IsMainInterpreter(interp)) {
// bpo-46070: Explicitly untrack all objects currently tracked by the
// GC. Otherwise, if an object is used later by another interpreter,
// calling PyObject_GC_UnTrack() on the object crashs if the previous
// or the next object of the PyGC_Head structure became a dangling
// pointer.
Comment on lines +2184 to +2188
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this change would be unnecessary no objects were ever shared. It may be worth adding such a note to this comment, so it's clear that this extra cleanup code could be removed at that point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we will be sure that it is really impossible to share any object, we can just remove this code.

for (int i = 0; i < NUM_GENERATIONS; i++) {
PyGC_Head *gen = GEN_HEAD(gcstate, i);
gc_fini_untrack(gen);
}
}
}

/* for debugging */
Expand Down