-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
GH-125603: Don't count executing generators and coroutines as referrers in gc.gc_referrers. #125640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
GH-125603: Don't count executing generators and coroutines as referrers in gc.gc_referrers. #125640
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asyncio changes LGTM
Objects/genobject.c
Outdated
@@ -58,7 +58,7 @@ gen_traverse(PyObject *self, visitproc visit, void *arg) | |||
PyGenObject *gen = _PyGen_CAST(self); | |||
Py_VISIT(gen->gi_name); | |||
Py_VISIT(gen->gi_qualname); | |||
if (gen->gi_frame_state != FRAME_CLEARED) { | |||
if (gen->gi_frame_state < FRAME_EXECUTING) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is okay, but it also adds some additional constraints that will make the GC in the free threading build more fragile. I think we have to make sure that any gen/coro marked as FRAME_EXECUTING
is on a PyThreadState
's frame stack -- otherwise some deferred references may not be visible to the GC and collected while still in use.
- There must not be any escaping calls between setting
gi_frame_state = FRAME_EXECUTING
and pushing the frame to the thread's stack. - There must not be any escaping calls between popping the frame from the stack and setting
gi_frame_state
to some other value. (i.e.,exit_unwind
->clear_gen_frame
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All those things are already true, because we rely on gi_frame_state == FRAME_EXECUTING
to guard against sending to an already executing generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At shutdown we delete all PyThreadState
s except for the main thread. Those threads could be running generators or coroutines stuck in some long running call (like a time.sleep()
). We run the GC after that. That seems like it would be unsafe (because we're now hiding deferred _PyStackRef
from the GC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test program segfaults in the free threading build with this PR:
https://gist.github.com/colesbury/11d59b9987e881a3c016b086bb4ba1ff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the generator is executing, it is part of the stack, not the heap.
So, I think tp_traverse
should not be traversing executing generators.
@colesbury
How does the free-threading GC find references that are on the stack (in normal frames)?
We should do that for executing generators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, during shutdown, we delete all PyThreadState
s except for the main thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me rephrase:
When deleting the thread state, how do we clean up the references?
Presumably, we should be changing the state of the generator when we clean up its frame but we aren't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To answer my own question. We don't cleanup the references 🙁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call PyThreadState_Clear()
on the deleted thread states, but that doesn't clean up current_frame
. We don't call PyThreadState_Delete()
for reasons that are not clear to me. Even if PyThreadState_Clear()
cleaned up current_frame
that wouldn't be sufficient because we unlink all the thread states before calling PyThreadState_Clear()
and (by your definition) that already puts them in an invalid state.
This is all longstanding CPython behavior, as far as I can tell. Changing the shutdown behavior seems likely to cause different shutdown-related bugs than what we experience today.
I think we can work around this by encoding more knowledge about generators in gc_free_threading.c
. I don't love that, but it seems less risky than messing with the shutdown behavior. Let me know if you want to go that route -- I can help with the gc_free_threading.c
changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to investigate fixing up thread state clean up. While it may well risk introducing bugs in the short term, I think to would be better long term. It is hard to optimize anything if we can't trust our supposed invariants.
A quick fix, until we decide on how to handle this long term, would be to traverse the frame list when deleting threads and mark all generators as suspended.
do { | ||
if (frame->owner == FRAME_OWNED_BY_GENERATOR) { | ||
PyGenObject *gen = _PyGen_GetGeneratorFromFrame(frame); | ||
gen->gi_frame_state = FRAME_ZOMBIE; | ||
} | ||
frame = frame->previous; | ||
} while (frame != NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not sufficient to mark generators as zombies in PyThreadState_Clear()
. We call PyThreadState_Clear()
one at a time on each non-main thread:
- At that point, they are already unlinked from the interpreter's list of
PyThreadStates
- The
PyThreadState_Clear()
calls makes escaping calls (Py_CLEAR()
) that can trigger the GC
So the GC may still see generators that are marked as "running", but aren't in any accessible PyThreadState
's frame stack.
Skipping news as this is reverting behavior that never made it into a release.