Skip to content

feat: add single tailnet support to pgcoord #9351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Sep 21, 2023
Merged

Conversation

coadler
Copy link
Contributor

@coadler coadler commented Aug 25, 2023

This adds support for the single tailnet expieriment to pgcoord. Afer this there aren't any major blockers for making it GA.

@coadler coadler self-assigned this Aug 25, 2023
@coadler coadler force-pushed the colin/single-pgcoord branch from 20a5183 to 4a4bbdb Compare August 25, 2023 22:47
@coadler coadler marked this pull request as ready for review August 25, 2023 23:25
@coadler coadler requested a review from spikecurtis August 28, 2023 02:17
@@ -42,6 +43,9 @@ func NewDB(t testing.TB) (database.Store, pubsub.Pubsub) {
})
db = database.New(sqlDB)

err = migrations.Up(sqlDB)
require.NoError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is required, how were our tests working before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only if you specify a database URL to run tests against. I needed to run a standalone postgres to see the logs and previously it wouldn't run the migrations. I can prob remove this, I'm not sure if it has any greater effects.

require.NoError(t, err)

assertEventuallyNoClientsForAgent(ctx, t, store, agent1.id)
assertEventuallyNoAgents(ctx, t, store, agent1.id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional test cases I'd like to see:

  1. MultiAgent with multiple coordinators
  2. MultiAgent UpdateSelf first, then SubscribeAgent --- verify the agent gets the update.
  3. UnsubscribeAgent --- verify that the agent no longer gets updates about the client and vice versa
  4. Subscribe to multiple agents

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on implementing these after the rest is solid 👍

@github-actions github-actions bot added the stale This issue is like stale bread. label Sep 7, 2023
@coadler coadler force-pushed the colin/single-pgcoord branch from 1cd56e6 to a7b39df Compare September 7, 2023 19:20
@matifali matifali removed the stale This issue is like stale bread. label Sep 7, 2023
@coadler coadler requested a review from spikecurtis September 8, 2023 00:33
delete(q.conns, c)

if _, ok := q.clientSubscriptions[c.UniqueID()]; !ok {
q.clientSubscriptions[c.UniqueID()] = map[uuid.UUID]struct{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a memory leak: you add to q.clientSubscriptions but never remove.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still never delete anything from q.clientSubscriptions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I think I got this now.

@coadler coadler requested a review from spikecurtis September 16, 2023 01:49
delete(q.conns, c)

if _, ok := q.clientSubscriptions[c.UniqueID()]; !ok {
q.clientSubscriptions[c.UniqueID()] = map[uuid.UUID]struct{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still never delete anything from q.clientSubscriptions

kind: c.Kind(),
}
cm, ok := q.mappers[mk]
if ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fact that you're checking ok here smells funny and indicates that we're not keeping the data structures in sync.

It should be an invariant in this code that a client subscription in the q.clientSubscriptions map implies that a corresponding mapper exists. They are always added together and always removed together while holding the mutex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you think we should handle a case where the mapper doesn't exist. Panic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, panic, or at the very least, drop a Critical log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a pretty easy way to trigger this invariant. It's possible for (*querier).cleanupConn to race with a call to removeClientSubscription, such that removeClientSubscription is trying to remove a subscription that no longer exists. I'm not too sure how to prevent this, except by ignoring duplicates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The invariant is that a client subscription exists in q.clientSubscriptions AND in the mappers, or neither. It can never be in one but not the other when the mutex is unlocked.

You're right though, that in this design, it's possible for removeClientSubscription to race with cleanupConn such that you can try to remove a subscription that was already cleaned up. What you've done is fine and follows the invariant --- if the subscription is not in q.clientSubscriptions that means it's already cleaned up from the mappers too, and removeClientSubscription can return early.

@coadler coadler requested a review from spikecurtis September 20, 2023 04:14
if sub.active {
q.newClientSubscription(sub.q, sub.agentID)
} else {
q.removeClientSubscription(sub.q, sub.agentID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you send a subscription with agentID unset on remove. The subscriber handles this case, but then it forwards it to the querier which I don't think handles the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's technically handled by checking the clientSubscriptions map, since there will never be an agent subscription with uuid.Nil. I had an explicit check but just replaced it with the map lookup, since it essentially does the same thing. Might be good to comment.

@coadler coadler requested a review from spikecurtis September 20, 2023 15:39
Copy link
Contributor

@spikecurtis spikecurtis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@coadler coadler merged commit c900b5f into main Sep 21, 2023
@coadler coadler deleted the colin/single-pgcoord branch September 21, 2023 19:30
@github-actions github-actions bot locked and limited conversation to collaborators Sep 21, 2023
@coadler coadler linked an issue Sep 21, 2023 that may be closed by this pull request
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Single Tailnet: Implement for HA coordinator
3 participants