Skip to content

fix(coderd): remove rate limits from agent metadata #9308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 24, 2023
Merged

fix(coderd): remove rate limits from agent metadata #9308

merged 1 commit into from
Aug 24, 2023

Conversation

ammario
Copy link
Member

@ammario ammario commented Aug 24, 2023

fix(coderd): remove rate limits from agent metadata

Include the full update message in the PubSub notification so that
we don't have to refresh metadata from the DB and can avoid rate
limiting.

return
}

err = api.Pubsub.Publish(watchWorkspaceAgentMetadataChannel(workspaceAgent.ID), datumJSON)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how limited the pubsub capacity is wrt payload size, but since we don't control the size (of e.g. values), we'll still need a different approach to propagate the value. Not against adding a few more fields to the publish though if they're helpful.

I see a test also caught this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I think I left a comment but it was lost.

The 32k limit was too much anyways so I dropped it down to 4k and used gob encoding to avoid inflation.

Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Postgres isn't liking the Gob (wants UTF8). Other than that and a few other observations, I think this approach is fine. 👍🏻

maxValueLen = 32 << 10
// maxValueLen is set to 4096 to stay under the 8000 byte Postgres
// NOTIFY limit.
maxValueLen = 4096
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be both a value and an error, resulting in 2*4096?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah

}
pubsubUpdates <- update
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If sendMetadata takes a long time, there's a chance we'll fill up the 8 buffer slots we've got here and then hang.

We could instead append to a slice here and use a buffer of 1 instead to signal that there's new data in the slice to process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to cap the buffer size at something though, at which point we're back to the same problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean preventing the slice from growing unboundedly? I suppose we can just limit it's max size and discard the oldest once it has more than X entries? As for notifying we can just do a select with a default on a 1 size chan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I can actually just use a map on the key

// We always use the original Request context because it contains
// the RBAC actor.
lastDBMeta, err = api.Database.GetWorkspaceAgentMetadata(ctx, workspaceAgent.ID)
lastMetadata, err = api.Database.GetWorkspaceAgentMetadata(ctx, workspaceAgent.ID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fails we should probably disconnect the client. Pull only happens once and without lastMetadata populated to all metadata entries, we will never return any info.

@ammario ammario force-pushed the pr9308 branch 2 times, most recently from 9ea6948 to 2bf7588 Compare August 24, 2023 17:50
@ammario
Copy link
Member Author

ammario commented Aug 24, 2023

2bf7588:

The new approach uses a mutex-protected map to store pending changes and avoids
backpressure problems.

The code is unfortunately complex, but couldn't think of an elegant channel structure.

Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes but looks good. A sync.Map might help clean it up a bit.

// We avoid channel-based synchronization here to avoid backpressure problems.
var (
metadataMapMu sync.Mutex
metadataMap = make(map[string]database.WorkspaceAgentMetadatum, 128)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there always be 128 entries or would it be better to let it grow as needed?


var lastSend time.Time
sendMetadata := func() {
lastSend = time.Now()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be ideal to set this after the lock, that way it closer represents when we actually sent it.

defer refreshTicker.Stop()
// We send updates at most every second.
const sendInterval = time.Second * 1
sendTicker := time.NewTicker(sendInterval)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A NewTimer might be better, coupled with Reset if we want to be literal with the comment above (at most every second). The next ticker could immediately be ready depending on how things went.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly every second was my intention — updated the comment.

httpapi.InternalServerError(rw, err)
return
}
defer cancelSub()
for _, datum := range md {
metadataMap[datum.Key] = datum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably hold the mutex lock here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah 🤦🏽‍♂️

Include the full update message in the PubSub notification so that
we don't have to refresh metadata from the DB and can avoid rate
limiting.
@ammario ammario merged commit 630ec55 into main Aug 24, 2023
@ammario ammario deleted the pr9308 branch August 24, 2023 20:18
@github-actions github-actions bot locked and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants