-
Notifications
You must be signed in to change notification settings - Fork 873
feat: reinitialize agents when a prebuilt workspace is claimed #17475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
35e4bf8
to
18da76e
Compare
fe569d4
to
fcdbba8
Compare
cc25406
to
26dbc3a
Compare
ec9ed29
to
362db7c
Compare
// Complete the job, optionally triggering workspace agent reinitialization: | ||
|
||
completedJob := proto.CompletedJob{ | ||
JobId: job.ID.String(), | ||
Type: &proto.CompletedJob_WorkspaceBuild_{ | ||
WorkspaceBuild: &proto.CompletedJob_WorkspaceBuild{}, | ||
}, | ||
} | ||
_, err = srv.CompleteJob(ctx, &completedJob) | ||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate by defining 'this'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does CompleteJob
need to occur for the test to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah. CompleteJob is the thing that publishes to the channel. Its the entire point of this test, yes. If CompleteJob ever stops publishing, this should fail.
} | ||
|
||
go func() { | ||
<-ctx.Done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Howcome we wait for this to complete before cancel
-ing? To protect messages from being passed to workspaceClaims
after we close the channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we expect the user to defer cancel()
where they call this. But we can't enforce that. As a defensive measure to protect against leaking the goroutine and pubsub connection, as well as the channel, we call cancel() ourselves when the context expires.
coderd/prebuilds/claim.go
Outdated
} | ||
|
||
func StreamAgentReinitEvents(ctx context.Context, logger slog.Logger, rw http.ResponseWriter, r *http.Request, reinitEvents <-chan agentsdk.ReinitializationEvent) { | ||
sseSendEvent, sseSenderClosed, err := httpapi.ServerSentEventSender(rw, r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can fix this later, but I think this might be inappropriate at this layer.
This seems like a detail of the HTTP API, and we aren't testing this directly so I'd argue this code could go back to coderd/workspaceagents.go
.
Message string `json:"message"` | ||
Reason ReinitializationReason `json:"reason"` | ||
WorkspaceID uuid.UUID | ||
UserID uuid.UUID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bleeds the abstraction a bit because there's nothing prebuilds-specific in this func. The workspace's agent is what reinitializes, and the user should be irrelevant here. Do we need this?
@@ -294,7 +299,7 @@ message Metadata { | |||
string workspace_owner_login_type = 18; | |||
repeated Role workspace_owner_rbac_roles = 19; | |||
bool is_prebuild = 20; | |||
string running_workspace_agent_token = 21; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self-review: it concerns me that running_workspace_agent_token
made it into main. I'd love to undo that before this is released but it might already be too late.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running_workspace_agent_token is defunct and meant to be replaced by running_agent_auth_tokens.
The former only supports a single agent. The latter supports multiple agents.
Nothing will break if the former stays, but it pollutes the API
message RunningAgentAuthToken { | ||
string agent_id = 1; | ||
string token = 2; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When adding a new field, you need to increment proto.CurrentVersion
. We tend to add a comment as well explaning the versioning history. See provisionerd/proto/version.go
.
tokens := metadata.GetRunningAgentAuthTokens() | ||
if len(tokens) == 1 { | ||
env = append(env, provider.RunningAgentTokenEnvironmentVariable("")+"="+tokens[0].Token) | ||
} else { | ||
// Not currently supported, but added for forward-compatibility | ||
for _, t := range tokens { | ||
// If there are multiple agents, provide all the tokens to terraform so that it can | ||
// choose the correct one for each agent ID. | ||
env = append(env, provider.RunningAgentTokenEnvironmentVariable(t.AgentId)+"="+t.Token) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't changes here also introduce tests in provisioner_test.go
?
defer srv.Close() | ||
|
||
requestCtx := testutil.Context(t, testutil.WaitShort) | ||
req, err := http.NewRequestWithContext(requestCtx, "GET", srv.URL, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use srv.Client()
here?
// - prebuild claimed: a prebuilt workspace is claimed, so the agent must reinitialize. | ||
func (c *Client) WaitForReinit(ctx context.Context) (*ReinitializationEvent, error) { | ||
// TODO: allow configuring httpclient | ||
c.SDK.HTTPClient.Timeout = time.Hour * 24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this essentially place an upper limit of 24 hours on the maximum lifetime of a prebuild?
@@ -16,8 +18,9 @@ VALUES ( | |||
|
|||
-- name: InsertPresetParameters :many | |||
INSERT INTO | |||
template_version_preset_parameters (template_version_preset_id, name, value) | |||
template_version_preset_parameters (id, template_version_preset_id, name, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it important to be able to control the ID of a template version preset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when setting up test environments where one needs to assert based on the relationships between presets, their parameters and other parts of the DB, we need to be able to express that relationship in the test by setting the ids explicitly.
// | ||
// waiter := coderdtest.NewWorkspaceAgentWaiter(t, client, r.Workspace.ID) | ||
// waiter.WaitFor(coderdtest.AgentsReady) | ||
type WaitForCriterium func(agent codersdk.WorkspaceAgent) bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion, nonblocking: rename to WaitForAgentFn
for clarity
This pull request allows coder workspace agents to be reinitialized when a prebuilt workspace is claimed by a user. This facilitates the transfer of ownership between the anonymous prebuilds system user and the new owner of the workspace.
Only a single agent per prebuilt workspace is supported for now, but plumbing has already been done to facilitate the seamless transition to multi-agent support.