Skip to content

fix: fix pgcoord to delete coordinator row last #12155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 15, 2024

Conversation

spikecurtis
Copy link
Contributor

@spikecurtis spikecurtis commented Feb 15, 2024

Fixes #12141
Fixes #11750

PGCoord shutdown was uncoordinated, so an update at an inopportune time during shutdown would be rejected because the coordinator row was already deleted.

This PR ensures that the PGCoord subcomponents that write updates are shut down before we take down the heartbeats, which is responsible for deleting the coordinator row.

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @spikecurtis and the rest of your teammates on Graphite Graphite

@spikecurtis spikecurtis marked this pull request as ready for review February 15, 2024 09:58
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagged one thing, but other than that, LGTM.

@@ -454,6 +474,9 @@ func newBinder(ctx context.Context,
workQ: newWorkQ[bKey](ctx),
}
go b.handleBindings()
// add to the waitgroup immediately to avoid any races waiting for it before
// the workers start.
b.workerWG.Add(numBinderWorkers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance that <-startWorkers below (i.e. fHB) doesn't get closed (e.g. some error during startup), and thus, these waitgroups never resolving?

(I didn't try to dig in as to how or where fHB is closed as it's not obvious from this PR.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets closed unconditionally after we send the first heartbeat (success or fail).

@spikecurtis spikecurtis merged commit 627232e into main Feb 15, 2024
@spikecurtis spikecurtis deleted the spike/12141-flake-write-binding branch February 15, 2024 12:34
@github-actions github-actions bot locked and limited conversation to collaborators Feb 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants