-
Notifications
You must be signed in to change notification settings - Fork 943
feat(site): use websocket connection for devcontainer updates #18808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Instead of polling every 10 seconds, we instead use a WebSocket connection for more timely updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds real-time devcontainer updates over WebSocket instead of polling, and updates related client and server code.
- Introduce
useAgentContainers
hook leveraging WebSockets in the frontend - Add WebSocket watch endpoints and SDK methods for container updates across site API, SDK, and agent code
- Replace polling in
AgentRow
with the new hook and update documentation and tests accordingly
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
site/src/modules/resources/useAgentContainers.ts | New React hook subscribing to container updates via WS |
site/src/modules/resources/useAgentContainers.test.tsx | Tests for the hook’s initial fetch behavior |
site/src/modules/resources/AgentRow.tsx | Switched from polling to WebSocket hook |
site/src/api/api.ts | Export watchAgentContainers for WebSocket route |
docs/reference/api/agents.md | Add API reference for the new watch endpoint |
codersdk/workspacesdk/agentconn.go | Add WatchContainers method to agent connection SDK |
codersdk/workspaceagents.go | Add WatchWorkspaceAgentContainers to SDK client |
coderd/workspaceagents.go | Implement WebSocket handler for watch endpoint |
coderd/coderd.go | Register /containers/watch route |
coderd/apidoc/swagger.json & docs.go | Document new watch route in Swagger and docs template |
agent/agentcontainers/api.go | Add watchContainers WS handler in agent API |
Comments suppressed due to low confidence (5)
docs/reference/api/agents.md:908
- This endpoint upgrades to a WebSocket connection rather than a plain HTTP GET; consider updating the code sample to show a WebSocket client (e.g.,
wscat
or JS) for clarity.
curl -X GET http://coder-server:8080/api/v2/workspaceagents/{workspaceagent}/containers/watch \
site/src/modules/resources/useAgentContainers.test.tsx:85
- Tests currently cover only the initial fetch; add a test that mocks WebSocket messages and verifies that
useAgentContainers
updates when a message arrives.
});
site/src/modules/resources/AgentRow.tsx:75
- The
queryClient
variable is declared but never used; consider removing this import and assignment to reduce clutter.
const queryClient = useQueryClient();
agent/agentcontainers/api.go:585
- The
slices
package is referenced here but not imported, causing a compile error. Addimport "slices"
or reference the correct package.
api.updateChans = slices.DeleteFunc(api.updateChans, func(ch chan struct{}) bool {
site/src/modules/resources/useAgentContainers.ts:28
- Calling
invalidateQueries
aftersetQueryData
will trigger an extra network request on each WebSocket message. You can remove the invalidation to prevent unnecessary refetches.
await queryClient.invalidateQueries({ queryKey });
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backend changes LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointed out a few gotchas but otherwise looks good, nice work!
agent/agentcontainers/api.go
Outdated
updateCh := make(chan struct{}, 1) | ||
defer func() { | ||
api.mu.Lock() | ||
close(updateCh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd make sense to move this to the defer below to keep the related code close to each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 6ce5c19
agent/agentcontainers/api.go
Outdated
return | ||
} | ||
|
||
ctx = api.ctx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do this here instead:
ctx, wsNetConn := codersdk.WebsocketNetConn(ctx, conn, websocket.MessageBinary)
defer wsNetConn.Close()
Note that we're not passing api.ctx
to WebsocketNetConn
but that's OK because of the final select (see below).
Then a few important changes following:
- Heartbeat doesn't change (needs
Ping
fromconn
) - We can omit
conn.Close
(handled viawsNetConn.Close
) - Encoder uses
wsNetConn
- Final select has one entry for
api.ctx
and another forctx
This follows practice we use elsewhere in the code-base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this possible? wsjson.NewEncoder
requires a *websocket.Conn
but wsNetConn
is a net.Conn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wsjson.NewEncoder
is a tiny wrapper, i'll just write the underlying logic using wsNetConn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 04a92a4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, I hadn't actually seen/used wsjson.NewEncoder
before. The CloseRead
it calls would be ideal to retain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have re-added this in 😄 096a85e
agent/agentcontainers/api.go
Outdated
@@ -583,8 +647,32 @@ func (api *API) updateContainers(ctx context.Context) error { | |||
api.mu.Lock() | |||
defer api.mu.Unlock() | |||
|
|||
var knownDevcontainers map[string]codersdk.WorkspaceAgentDevcontainer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Naming so nobody accidentally updates this map rather than api.knownDevcontainers
. Perhaps knownDevcontainersBackup
or something along those lines those lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in cd0c2d5
coderd/workspaceagents.go
Outdated
return | ||
} | ||
|
||
ctx = api.ctx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See related comments on api.go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 04a92a4
codersdk/workspacesdk/agentconn.go
Outdated
defer res.Body.Close() | ||
} | ||
|
||
d := wsjson.NewDecoder[codersdk.WorkspaceAgentListContainersResponse](conn, websocket.MessageText, slog.Logger{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the repercussion of passing slog.Logger
here, unstructured output on stdout?
If it's required, perhaps we should require it as a WatchContainers
parameter if we can't put it on the AgentConn
struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely forgot to update this when I was tinkering 🤦♀️ Will resolve this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 3e50965
socket.addEventListener("message", (event) => { | ||
if (event.parseError) { | ||
displayError( | ||
"Unable to process latest data from the server. Please try refreshing the page.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this message be more specific? I.e. mention what operation failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly although I'm just following existing logic here. I'm not sure what context about the error exists to make this more useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that you are just following the existing logic, but I see this as an opportunity to do better with those messages, having in mind they are displayed to the users.
The displayError
function accepts a message and a description, so to make this error better to the user I think we can make the message shorter and tell to the user what is the impact of it, instead of the techinical reason, and give it an way to possible solve the problem as you already put in the message.
displayError(
"Failed to update containers",
"Please try refreshing the page"
)
Wdyt?
|
||
socket.addEventListener("error", () => { | ||
displayError( | ||
"Unable to get workspace containers. Connection has been closed.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we just give up here? Is there precedent for trying to reconnect later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the frontend changes I was following logic I found elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as my previous comment.
displayError(
"Failed to load containers",
"Please try refreshing the page"
)
d.WorkspaceFolder == other.WorkspaceFolder && | ||
d.Status == other.Status && | ||
d.Dirty == other.Dirty && | ||
d.Error == other.Error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also check container and agent, so something like this?
d.Error == other.Error | |
d.Error == other.Error && | |
(d.Container == nil && other.Container == nil || (d.Container != nil && other.Continer != nil && d.Container.ID == other.Container.ID)) && | |
(d.Agent == nil && ... d.Agent.ID == ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 001ccda
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see d.Agent
checked? IMO we should propagate changes to it as well since that's how the frontend associates the sub agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh silly me I only read half the sentence 🤦♀️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed again in 971f9d6 😅
coderd/workspaceagents.go
Outdated
|
||
// If the agent is unreachable, the request will hang. Assume that if we | ||
// don't get a response after 30s that the agent is unreachable. | ||
ctx, cancel := context.WithTimeout(ctx, 30*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: call this dialCtx
to avoid re-use. I believe we will always have a disconnect after 30s currently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 04a92a4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DanielleMaywood I'm missing some screenshots in the PR description showing the errors being displayed since we don't have stories in Storybook for that. Could you please add that? 🙏
I left a few comments related to the error messages.
socket.addEventListener("message", (event) => { | ||
if (event.parseError) { | ||
displayError( | ||
"Unable to process latest data from the server. Please try refreshing the page.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that you are just following the existing logic, but I see this as an opportunity to do better with those messages, having in mind they are displayed to the users.
The displayError
function accepts a message and a description, so to make this error better to the user I think we can make the message shorter and tell to the user what is the impact of it, instead of the techinical reason, and give it an way to possible solve the problem as you already put in the message.
displayError(
"Failed to update containers",
"Please try refreshing the page"
)
Wdyt?
|
||
socket.addEventListener("error", () => { | ||
displayError( | ||
"Unable to get workspace containers. Connection has been closed.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as my previous comment.
displayError(
"Failed to load containers",
"Please try refreshing the page"
)
expect(result.current).toBeUndefined(); | ||
}); | ||
}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a plus to have tests to verify:
- Parsing error
- Socket error
I know how annoying it can be to have this setup working properly without causing any flakes, so I wanted to let you know in case you are curious about, but it is not a blocket at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to add those.
I just want to give a fair warning: These tests are mostly written by Claude Sonnet 4 as my frontend testing experience is lacking. Please let me know if it has made any obvious blunders 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving BE with minor suggestions, as per my earlier comment FE XHR might still need some love.
Instead of polling every 10 seconds, we instead use a WebSocket connection for more timely updates.