Skip to content

feat(agent/agentcontainers): recreate devcontainers concurrently #18042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 26, 2025

Conversation

mafredri
Copy link
Member

This change introduces a refactor of the devcontainers recreation logic
which is now handled asynchronously rather than being request scoped.
The response was consequently changed from "No Content" to "Accepted" to
reflect this.

A new Status field was introduced to the devcontainer struct which
replaces Running (bool). This reflects that the devcontainer can now
be in various states (starting, running, stopped or errored).

The status field also protects against multiple concurrent recrations,
as long as they are initiated via the API.

Updates #16424

@mafredri mafredri force-pushed the mafredri/feat-agent-agentcontainers-recreate-async branch 4 times, most recently from d17bca2 to 0ebc262 Compare May 26, 2025 12:02
This change introduces a refactor of the devcontainers recreation logic
which is now handled asynchronously rather than being request scoped.
The response was consequently changed from "No Content" to "Accepted" to
reflect this.

A new `Status` field was introduced to the devcontainer struct which
replaces `Running` (bool). This reflects that the devcontainer can now
be in various states (starting, running, stopped or errored).

The status field also protects against multiple concurrent recrations,
as long as they are initiated via the API.

Updates #16424
@mafredri mafredri force-pushed the mafredri/feat-agent-agentcontainers-recreate-async branch from 0ebc262 to 9328228 Compare May 26, 2025 12:02
@mafredri mafredri requested a review from Copilot May 26, 2025 12:04
Copilot

This comment was marked as resolved.

// @Param workspaceagent path string true "Workspace agent ID" format(uuid)
// @Param container path string true "Container ID or name"
// @Success 204
// @Success 202 {object} codersdk.Response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the response code change here does not appear to affect any endpoints currently in use by the FE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, not yet.

err = json.NewDecoder(rec.Body).Decode(&resp)
require.NoError(t, err, "unmarshal response failed after recreation")
require.Len(t, resp.Devcontainers, 1, "expected one devcontainer in response after recreation")
assert.Equal(t, codersdk.WorkspaceAgentDevcontainerStatusRunning, resp.Devcontainers[0].Status, "devcontainer is not stopped after recreation")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could also be a require?

err = json.NewDecoder(rec.Body).Decode(&resp)
require.NoError(t, err, "unmarshal response failed after error")
require.Len(t, resp.Devcontainers, 1, "expected one devcontainer in response after error")
assert.Equal(t, codersdk.WorkspaceAgentDevcontainerStatusError, resp.Devcontainers[0].Status, "devcontainer is not in an error state after up failure")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could also be a require?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To what benefit? Assert is useful in that it can allow multiple conditions to fail and give you a larger picture of what's wrong. Require is ofc needed whenever you need that condition to be true, like nil and lengths checks so that the rest of the code doesn't panic.

Copy link
Member

@johnstcn johnstcn May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My nit is very much stylistic here; having an assert on the last line below one or more require doesn't help us against panics or any of the above you mentioned. Feel free to ignore!

// The devcontainer state must be set to starting and the recreateWg must be
// incremented before calling this function.
func (api *API) recreateDevcontainer(dc codersdk.WorkspaceAgentDevcontainer, configPath string) {
defer api.recreateWg.Done()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm nervous that a hanging request to recreateDevcontainer could make closing the devcontainers API hang on api.wg.Wait(). I wonder if it makes sense to have a 'graceful' shutdown context and a 'hard' shutdown context that will forcefully cancel all in-flight recreation waitgroups after a certain period of time elapses?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses api.ctx which is essentially that "hard" shutdown context, there isn't any reason to have a graceful one AFAICT.

@mafredri mafredri marked this pull request as ready for review May 26, 2025 14:07
@mafredri mafredri merged commit 0731304 into main May 26, 2025
39 of 41 checks passed
@mafredri mafredri deleted the mafredri/feat-agent-agentcontainers-recreate-async branch May 26, 2025 15:30
@github-actions github-actions bot locked and limited conversation to collaborators May 26, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants