-
Notifications
You must be signed in to change notification settings - Fork 887
fix(coderd): remove rate limits from agent metadata #9308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
return | ||
} | ||
|
||
err = api.Pubsub.Publish(watchWorkspaceAgentMetadataChannel(workspaceAgent.ID), datumJSON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how limited the pubsub capacity is wrt payload size, but since we don't control the size (of e.g. values), we'll still need a different approach to propagate the value. Not against adding a few more fields to the publish though if they're helpful.
I see a test also caught this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I think I left a comment but it was lost.
The 32k limit was too much anyways so I dropped it down to 4k and used gob encoding to avoid inflation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like Postgres isn't liking the Gob (wants UTF8). Other than that and a few other observations, I think this approach is fine. 👍🏻
coderd/workspaceagents.go
Outdated
maxValueLen = 32 << 10 | ||
// maxValueLen is set to 4096 to stay under the 8000 byte Postgres | ||
// NOTIFY limit. | ||
maxValueLen = 4096 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can there be both a value and an error, resulting in 2*4096?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah
coderd/workspaceagents.go
Outdated
} | ||
pubsubUpdates <- update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If sendMetadata
takes a long time, there's a chance we'll fill up the 8 buffer slots we've got here and then hang.
We could instead append to a slice here and use a buffer of 1 instead to signal that there's new data in the slice to process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to cap the buffer size at something though, at which point we're back to the same problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean preventing the slice from growing unboundedly? I suppose we can just limit it's max size and discard the oldest once it has more than X entries? As for notifying we can just do a select with a default on a 1 size chan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I can actually just use a map on the key
coderd/workspaceagents.go
Outdated
// We always use the original Request context because it contains | ||
// the RBAC actor. | ||
lastDBMeta, err = api.Database.GetWorkspaceAgentMetadata(ctx, workspaceAgent.ID) | ||
lastMetadata, err = api.Database.GetWorkspaceAgentMetadata(ctx, workspaceAgent.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this fails we should probably disconnect the client. Pull only happens once and without lastMetadata
populated to all metadata entries, we will never return any info.
9ea6948
to
2bf7588
Compare
The new approach uses a mutex-protected map to store pending changes and avoids The code is unfortunately complex, but couldn't think of an elegant channel structure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few notes but looks good. A sync.Map
might help clean it up a bit.
coderd/workspaceagents.go
Outdated
// We avoid channel-based synchronization here to avoid backpressure problems. | ||
var ( | ||
metadataMapMu sync.Mutex | ||
metadataMap = make(map[string]database.WorkspaceAgentMetadatum, 128) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there always be 128 entries or would it be better to let it grow as needed?
coderd/workspaceagents.go
Outdated
|
||
var lastSend time.Time | ||
sendMetadata := func() { | ||
lastSend = time.Now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be ideal to set this after the lock, that way it closer represents when we actually sent it.
defer refreshTicker.Stop() | ||
// We send updates at most every second. | ||
const sendInterval = time.Second * 1 | ||
sendTicker := time.NewTicker(sendInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A NewTimer
might be better, coupled with Reset
if we want to be literal with the comment above (at most every second). The next ticker could immediately be ready depending on how things went.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly every second was my intention — updated the comment.
httpapi.InternalServerError(rw, err) | ||
return | ||
} | ||
defer cancelSub() | ||
for _, datum := range md { | ||
metadataMap[datum.Key] = datum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably hold the mutex lock here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah 🤦🏽♂️
Include the full update message in the PubSub notification so that we don't have to refresh metadata from the DB and can avoid rate limiting.
fix(coderd): remove rate limits from agent metadata
Include the full update message in the PubSub notification so that
we don't have to refresh metadata from the DB and can avoid rate
limiting.