-
Notifications
You must be signed in to change notification settings - Fork 887
feat: add single tailnet support to pgcoord #9351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
20a5183
to
4a4bbdb
Compare
coderd/database/migrations/000152_pg_coordinator_single_tailnet.up.sql
Outdated
Show resolved
Hide resolved
coderd/database/dbtestutil/db.go
Outdated
@@ -42,6 +43,9 @@ func NewDB(t testing.TB) (database.Store, pubsub.Pubsub) { | |||
}) | |||
db = database.New(sqlDB) | |||
|
|||
err = migrations.Up(sqlDB) | |||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is required, how were our tests working before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only if you specify a database URL to run tests against. I needed to run a standalone postgres to see the logs and previously it wouldn't run the migrations. I can prob remove this, I'm not sure if it has any greater effects.
coderd/database/migrations/000152_pg_coordinator_single_tailnet.up.sql
Outdated
Show resolved
Hide resolved
enterprise/tailnet/pgcoord_test.go
Outdated
require.NoError(t, err) | ||
|
||
assertEventuallyNoClientsForAgent(ctx, t, store, agent1.id) | ||
assertEventuallyNoAgents(ctx, t, store, agent1.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional test cases I'd like to see:
- MultiAgent with multiple coordinators
- MultiAgent
UpdateSelf
first, thenSubscribeAgent
--- verify the agent gets the update. UnsubscribeAgent
--- verify that the agent no longer gets updates about the client and vice versa- Subscribe to multiple agents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll work on implementing these after the rest is solid 👍
1cd56e6
to
a7b39df
Compare
delete(q.conns, c) | ||
|
||
if _, ok := q.clientSubscriptions[c.UniqueID()]; !ok { | ||
q.clientSubscriptions[c.UniqueID()] = map[uuid.UUID]struct{}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a memory leak: you add to q.clientSubscriptions
but never remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still never delete anything from q.clientSubscriptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I think I got this now.
coderd/database/migrations/000155_pg_coordinator_single_tailnet.up.sql
Outdated
Show resolved
Hide resolved
coderd/database/migrations/000155_pg_coordinator_single_tailnet.up.sql
Outdated
Show resolved
Hide resolved
delete(q.conns, c) | ||
|
||
if _, ok := q.clientSubscriptions[c.UniqueID()]; !ok { | ||
q.clientSubscriptions[c.UniqueID()] = map[uuid.UUID]struct{}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still never delete anything from q.clientSubscriptions
enterprise/tailnet/pgcoord.go
Outdated
kind: c.Kind(), | ||
} | ||
cm, ok := q.mappers[mk] | ||
if ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the fact that you're checking ok
here smells funny and indicates that we're not keeping the data structures in sync.
It should be an invariant in this code that a client subscription in the q.clientSubscriptions
map implies that a corresponding mapper exists. They are always added together and always removed together while holding the mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you think we should handle a case where the mapper doesn't exist. Panic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, panic, or at the very least, drop a Critical
log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a pretty easy way to trigger this invariant. It's possible for (*querier).cleanupConn
to race with a call to removeClientSubscription
, such that removeClientSubscription
is trying to remove a subscription that no longer exists. I'm not too sure how to prevent this, except by ignoring duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The invariant is that a client subscription exists in q.clientSubscriptions
AND in the mappers, or neither. It can never be in one but not the other when the mutex is unlocked.
You're right though, that in this design, it's possible for removeClientSubscription
to race with cleanupConn
such that you can try to remove a subscription that was already cleaned up. What you've done is fine and follows the invariant --- if the subscription is not in q.clientSubscriptions
that means it's already cleaned up from the mappers too, and removeClientSubscription
can return early.
if sub.active { | ||
q.newClientSubscription(sub.q, sub.agentID) | ||
} else { | ||
q.removeClientSubscription(sub.q, sub.agentID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you send a subscription with agentID unset on remove. The subscriber handles this case, but then it forwards it to the querier which I don't think handles the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's technically handled by checking the clientSubscriptions
map, since there will never be an agent subscription with uuid.Nil
. I had an explicit check but just replaced it with the map lookup, since it essentially does the same thing. Might be good to comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This adds support for the single tailnet expieriment to pgcoord. Afer this there aren't any major blockers for making it GA.