Skip to content

feat(site): use websocket connection for devcontainer updates #18808

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Jul 14, 2025

Conversation

DanielleMaywood
Copy link
Contributor

Instead of polling every 10 seconds, we instead use a WebSocket connection for more timely updates.

@DanielleMaywood DanielleMaywood requested a review from Copilot July 9, 2025 12:06
Copilot

This comment was marked as outdated.

@DanielleMaywood DanielleMaywood marked this pull request as ready for review July 9, 2025 13:39
@DanielleMaywood DanielleMaywood marked this pull request as draft July 9, 2025 13:50
@DanielleMaywood DanielleMaywood marked this pull request as ready for review July 10, 2025 08:21
@DanielleMaywood DanielleMaywood requested a review from Copilot July 10, 2025 09:07
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds real-time devcontainer updates over WebSocket instead of polling, and updates related client and server code.

  • Introduce useAgentContainers hook leveraging WebSockets in the frontend
  • Add WebSocket watch endpoints and SDK methods for container updates across site API, SDK, and agent code
  • Replace polling in AgentRow with the new hook and update documentation and tests accordingly

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.

Show a summary per file
File Description
site/src/modules/resources/useAgentContainers.ts New React hook subscribing to container updates via WS
site/src/modules/resources/useAgentContainers.test.tsx Tests for the hook’s initial fetch behavior
site/src/modules/resources/AgentRow.tsx Switched from polling to WebSocket hook
site/src/api/api.ts Export watchAgentContainers for WebSocket route
docs/reference/api/agents.md Add API reference for the new watch endpoint
codersdk/workspacesdk/agentconn.go Add WatchContainers method to agent connection SDK
codersdk/workspaceagents.go Add WatchWorkspaceAgentContainers to SDK client
coderd/workspaceagents.go Implement WebSocket handler for watch endpoint
coderd/coderd.go Register /containers/watch route
coderd/apidoc/swagger.json & docs.go Document new watch route in Swagger and docs template
agent/agentcontainers/api.go Add watchContainers WS handler in agent API
Comments suppressed due to low confidence (5)

docs/reference/api/agents.md:908

  • This endpoint upgrades to a WebSocket connection rather than a plain HTTP GET; consider updating the code sample to show a WebSocket client (e.g., wscat or JS) for clarity.
curl -X GET http://coder-server:8080/api/v2/workspaceagents/{workspaceagent}/containers/watch \

site/src/modules/resources/useAgentContainers.test.tsx:85

  • Tests currently cover only the initial fetch; add a test that mocks WebSocket messages and verifies that useAgentContainers updates when a message arrives.
});

site/src/modules/resources/AgentRow.tsx:75

  • The queryClient variable is declared but never used; consider removing this import and assignment to reduce clutter.
	const queryClient = useQueryClient();

agent/agentcontainers/api.go:585

  • The slices package is referenced here but not imported, causing a compile error. Add import "slices" or reference the correct package.
		api.updateChans = slices.DeleteFunc(api.updateChans, func(ch chan struct{}) bool {

site/src/modules/resources/useAgentContainers.ts:28

  • Calling invalidateQueries after setQueryData will trigger an extra network request on each WebSocket message. You can remove the invalidation to prevent unnecessary refetches.
			await queryClient.invalidateQueries({ queryKey });

Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backend changes LGTM 👍

Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointed out a few gotchas but otherwise looks good, nice work!

updateCh := make(chan struct{}, 1)
defer func() {
api.mu.Lock()
close(updateCh)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd make sense to move this to the defer below to keep the related code close to each other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 6ce5c19

return
}

ctx = api.ctx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do this here instead:

	ctx, wsNetConn := codersdk.WebsocketNetConn(ctx, conn, websocket.MessageBinary)
	defer wsNetConn.Close()

Note that we're not passing api.ctx to WebsocketNetConn but that's OK because of the final select (see below).

Then a few important changes following:

  • Heartbeat doesn't change (needs Ping from conn)
  • We can omit conn.Close (handled via wsNetConn.Close)
  • Encoder uses wsNetConn
  • Final select has one entry for api.ctx and another for ctx

This follows practice we use elsewhere in the code-base.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this possible? wsjson.NewEncoder requires a *websocket.Conn but wsNetConn is a net.Conn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wsjson.NewEncoder is a tiny wrapper, i'll just write the underlying logic using wsNetConn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 04a92a4

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, I hadn't actually seen/used wsjson.NewEncoder before. The CloseRead it calls would be ideal to retain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have re-added this in 😄 096a85e

@@ -583,8 +647,32 @@ func (api *API) updateContainers(ctx context.Context) error {
api.mu.Lock()
defer api.mu.Unlock()

var knownDevcontainers map[string]codersdk.WorkspaceAgentDevcontainer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Naming so nobody accidentally updates this map rather than api.knownDevcontainers. Perhaps knownDevcontainersBackup or something along those lines those lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in cd0c2d5

return
}

ctx = api.ctx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See related comments on api.go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 04a92a4

defer res.Body.Close()
}

d := wsjson.NewDecoder[codersdk.WorkspaceAgentListContainersResponse](conn, websocket.MessageText, slog.Logger{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the repercussion of passing slog.Logger here, unstructured output on stdout?

If it's required, perhaps we should require it as a WatchContainers parameter if we can't put it on the AgentConn struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely forgot to update this when I was tinkering 🤦‍♀️ Will resolve this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 3e50965

socket.addEventListener("message", (event) => {
if (event.parseError) {
displayError(
"Unable to process latest data from the server. Please try refreshing the page.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this message be more specific? I.e. mention what operation failed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly although I'm just following existing logic here. I'm not sure what context about the error exists to make this more useful.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that you are just following the existing logic, but I see this as an opportunity to do better with those messages, having in mind they are displayed to the users.

The displayError function accepts a message and a description, so to make this error better to the user I think we can make the message shorter and tell to the user what is the impact of it, instead of the techinical reason, and give it an way to possible solve the problem as you already put in the message.

displayError(
	"Failed to update containers",
	"Please try refreshing the page"
)

Wdyt?


socket.addEventListener("error", () => {
displayError(
"Unable to get workspace containers. Connection has been closed.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we just give up here? Is there precedent for trying to reconnect later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the frontend changes I was following logic I found elsewhere.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as my previous comment.

displayError(
	"Failed to load containers",
	"Please try refreshing the page"
)

d.WorkspaceFolder == other.WorkspaceFolder &&
d.Status == other.Status &&
d.Dirty == other.Dirty &&
d.Error == other.Error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also check container and agent, so something like this?

Suggested change
d.Error == other.Error
d.Error == other.Error &&
(d.Container == nil && other.Container == nil || (d.Container != nil && other.Continer != nil && d.Container.ID == other.Container.ID)) &&
(d.Agent == nil && ... d.Agent.ID == ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 001ccda

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see d.Agent checked? IMO we should propagate changes to it as well since that's how the frontend associates the sub agent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh silly me I only read half the sentence 🤦‍♀️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed again in 971f9d6 😅


// If the agent is unreachable, the request will hang. Assume that if we
// don't get a response after 30s that the agent is unreachable.
ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: call this dialCtx to avoid re-use. I believe we will always have a disconnect after 30s currently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 04a92a4

Copy link
Collaborator

@BrunoQuaresma BrunoQuaresma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DanielleMaywood I'm missing some screenshots in the PR description showing the errors being displayed since we don't have stories in Storybook for that. Could you please add that? 🙏

I left a few comments related to the error messages.

socket.addEventListener("message", (event) => {
if (event.parseError) {
displayError(
"Unable to process latest data from the server. Please try refreshing the page.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that you are just following the existing logic, but I see this as an opportunity to do better with those messages, having in mind they are displayed to the users.

The displayError function accepts a message and a description, so to make this error better to the user I think we can make the message shorter and tell to the user what is the impact of it, instead of the techinical reason, and give it an way to possible solve the problem as you already put in the message.

displayError(
	"Failed to update containers",
	"Please try refreshing the page"
)

Wdyt?


socket.addEventListener("error", () => {
displayError(
"Unable to get workspace containers. Connection has been closed.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as my previous comment.

displayError(
	"Failed to load containers",
	"Please try refreshing the page"
)

expect(result.current).toBeUndefined();
});
});
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a plus to have tests to verify:

  • Parsing error
  • Socket error

I know how annoying it can be to have this setup working properly without causing any flakes, so I wanted to let you know in case you are curious about, but it is not a blocket at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to add those.

I just want to give a fair warning: These tests are mostly written by Claude Sonnet 4 as my frontend testing experience is lacking. Please let me know if it has made any obvious blunders 🙏

@mafredri
Copy link
Member

I tried out this branch and it looks like there's still XHR fetches happening in addition to the websocket pushing data. It'd be great if we could stop the XHR from happening and only rely on the websocket.

See:

image

There's no polling though, which is great.

(Cleared network history so websocket isn't visible here, but it's still there and sending data as well.)

Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving BE with minor suggestions, as per my earlier comment FE XHR might still need some love.

@DanielleMaywood
Copy link
Contributor Author

DanielleMaywood commented Jul 14, 2025

I tried out this branch and it looks like there's still XHR fetches happening in addition to the websocket pushing data. It'd be great if we could stop the XHR from happening and only rely on the websocket.

See:

image There's no polling though, which is great.

(Cleared network history so websocket isn't visible here, but it's still there and sending data as well.)

I've removed the query invalidation logic. This should reduce the number of XHR requests now. The only area left to handle is the /recreate flow. I'm not sure I want to tackle that in this PR though, so will leave that as a follow up.

I've attempted to reduce as many XHR requests as possible without entirely removing them.

@DanielleMaywood DanielleMaywood merged commit 43b0bb7 into main Jul 14, 2025
33 of 35 checks passed
@DanielleMaywood DanielleMaywood deleted the danielle/container-push branch July 14, 2025 20:35
@github-actions github-actions bot locked and limited conversation to collaborators Jul 14, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants