Skip to content

fix: stop holding Pubsub mutex while calling pq.Listener #12518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 12, 2024

Conversation

spikecurtis
Copy link
Contributor

@spikecurtis spikecurtis commented Mar 11, 2024

fixes #11950

#11950 (comment) explains the bug

We were also calling into Unlisten() and Close() while holding the mutex. I don't believe that Close() depends on the notification loop being unblocked, but it's hard to be sure, and the safest thing to do is assume it could block.

So, I added a unit test that fakes out pq.Listener and sends a bunch of notifies every time we call into it to hopefully prevent regression where we hold the mutex while calling into these functions.

It also removes the use of a context.Context to stop the PubSub -- it must be explicitly Closed(). This simplifies a bunch of the logic, and is how we use the pubsub anyway.

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @spikecurtis and the rest of your teammates on Graphite Graphite

@spikecurtis spikecurtis marked this pull request as ready for review March 11, 2024 11:55
@spikecurtis spikecurtis force-pushed the spike/11950-fix-deadlock branch 2 times, most recently from e3c30f7 to 194eb5f Compare March 11, 2024 13:04
Copy link
Member

@kylecarbs kylecarbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing you were able to find this. Happy the test keeps it from ever happening again!

@spikecurtis spikecurtis force-pushed the spike/11950-fix-deadlock branch from 194eb5f to 79b2e92 Compare March 12, 2024 05:25
@spikecurtis spikecurtis merged commit 5170744 into main Mar 12, 2024
@spikecurtis spikecurtis deleted the spike/11950-fix-deadlock branch March 12, 2024 05:44
@github-actions github-actions bot locked and limited conversation to collaborators Mar 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PGCoord fails to get pubsub updates
3 participants