Skip to content

GH-125603: Don't count executing generators and coroutines as referrers in gc.gc_referrers. #125640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Oct 17, 2024

Skipping news as this is reverting behavior that never made it into a release.

Copy link
Contributor

@kumaraditya303 kumaraditya303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asyncio changes LGTM

@@ -58,7 +58,7 @@ gen_traverse(PyObject *self, visitproc visit, void *arg)
PyGenObject *gen = _PyGen_CAST(self);
Py_VISIT(gen->gi_name);
Py_VISIT(gen->gi_qualname);
if (gen->gi_frame_state != FRAME_CLEARED) {
if (gen->gi_frame_state < FRAME_EXECUTING) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay, but it also adds some additional constraints that will make the GC in the free threading build more fragile. I think we have to make sure that any gen/coro marked as FRAME_EXECUTING is on a PyThreadState's frame stack -- otherwise some deferred references may not be visible to the GC and collected while still in use.

  • There must not be any escaping calls between setting gi_frame_state = FRAME_EXECUTING and pushing the frame to the thread's stack.
  • There must not be any escaping calls between popping the frame from the stack and setting gi_frame_state to some other value. (i.e., exit_unwind -> clear_gen_frame)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All those things are already true, because we rely on gi_frame_state == FRAME_EXECUTING to guard against sending to an already executing generator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At shutdown we delete all PyThreadStates except for the main thread. Those threads could be running generators or coroutines stuck in some long running call (like a time.sleep()). We run the GC after that. That seems like it would be unsafe (because we're now hiding deferred _PyStackRef from the GC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test program segfaults in the free threading build with this PR:

https://gist.github.com/colesbury/11d59b9987e881a3c016b086bb4ba1ff

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the generator is executing, it is part of the stack, not the heap.
So, I think tp_traverse should not be traversing executing generators.

@colesbury
How does the free-threading GC find references that are on the stack (in normal frames)?
We should do that for executing generators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, during shutdown, we delete all PyThreadStates except for the main thread.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me rephrase:
When deleting the thread state, how do we clean up the references?

Presumably, we should be changing the state of the generator when we clean up its frame but we aren't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer my own question. We don't cleanup the references 🙁

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call PyThreadState_Clear() on the deleted thread states, but that doesn't clean up current_frame. We don't call PyThreadState_Delete() for reasons that are not clear to me. Even if PyThreadState_Clear() cleaned up current_frame that wouldn't be sufficient because we unlink all the thread states before calling PyThreadState_Clear() and (by your definition) that already puts them in an invalid state.

This is all longstanding CPython behavior, as far as I can tell. Changing the shutdown behavior seems likely to cause different shutdown-related bugs than what we experience today.

I think we can work around this by encoding more knowledge about generators in gc_free_threading.c. I don't love that, but it seems less risky than messing with the shutdown behavior. Let me know if you want to go that route -- I can help with the gc_free_threading.c changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to investigate fixing up thread state clean up. While it may well risk introducing bugs in the short term, I think to would be better long term. It is hard to optimize anything if we can't trust our supposed invariants.

A quick fix, until we decide on how to handle this long term, would be to traverse the frame list when deleting threads and mark all generators as suspended.

@markshannon markshannon changed the title GH-125603: Don't count executing generators and coroutines as referrers in gc.gt_referrers. GH-125603: Don't count executing generators and coroutines as referrers in gc.gc_referrers. Oct 22, 2024
Comment on lines +1654 to +1660
do {
if (frame->owner == FRAME_OWNED_BY_GENERATOR) {
PyGenObject *gen = _PyGen_GetGeneratorFromFrame(frame);
gen->gi_frame_state = FRAME_ZOMBIE;
}
frame = frame->previous;
} while (frame != NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not sufficient to mark generators as zombies in PyThreadState_Clear(). We call PyThreadState_Clear() one at a time on each non-main thread:

  • At that point, they are already unlinked from the interpreter's list of PyThreadStates
  • The PyThreadState_Clear() calls makes escaping calls (Py_CLEAR()) that can trigger the GC

So the GC may still see generators that are marked as "running", but aren't in any accessible PyThreadState's frame stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants