feat(agent/agentcontainers): implement sub agent injection #18245

mafredri · 2025-06-05T12:40:52Z

This change adds support for sub agent creation and injection into dev
containers.

TODO:

Pass the correct access URL to sub agent
Add integration test
Use correct directory for sub agent (requires on-disk devcontainer.json parsing, follow-up PR implemented via pwd check, we can improve this in the future via "materialized devcontainer.json" via devcontainer read-configuration)
Parse .customizations.coder.devcontainer.name from docker container label (materialized devcontainer.json on creation, follow-up PR)
Add support for downloading agent binaries for different architectures (follow-up PR)
Make sure there are reduced capabilities for sub-agents (e.g. no containers API, follow-up PR) feat(agent): disable devcontainers for sub agents #18303

This change adds support for sub agent creation and injection into dev containers. Closes coder/internal#621

mafredri · 2025-06-06T16:14:53Z

I'm still working on an integration test and the existing mocks are being a PITA (think those are about sorted now though). Promoting this to "ready for review" to get some feedback on the approach @DanielleMaywood @johnstcn.

(Also going to break out the "follow-up PR" tasks into new issues before merging this.)

johnstcn

I still have to read some more but adding my comments so far.

agent/agentcontainers/api.go

johnstcn · 2025-06-06T16:33:42Z

agent/agentcontainers/api.go

+	err := api.dccli.Exec(agentCtx, dc.WorkspaceFolder, dc.ConfigPath, agentPath, []string{"agent"},
+		WithContainerID(container.ID),
+		WithRemoteEnv(
+			"CODER_AGENT_URL="+api.subAgentURL,
+			"CODER_AGENT_TOKEN="+agent.AuthToken.String(),
+		),
+	)


Would it make more sense to background this? If the parent agent ends up crashing and being restarted, we'll lose the sub-agents and have to re-inject them. We can keep track of the expected PID in e.g. /.coder-agent/pid

We could probably background it either on the host or inside the container, but not doing so has some nice properties:

We immediately discover if a sub agent exits/crashes and we could restart immediately (we don't currently)

Job control is simpler (simply cancel the context vs looking up processes and verifying against pid)

With prebuilds, we can exit all sub-agents on claim and re-inject afterwards to ensure a clean slate

For the case where the parent agent crashes, keeping those sub-agents may be a bit hit-and-miss and those dev containers could be affected anyway on agent startup. I'm not aware of agents crashing though so this might not even be a concern we need to be mindful of now?

Fair enough!

johnstcn · 2025-06-06T16:36:27Z

agent/agentcontainers/api.go

+	if _, err := api.ccli.ExecAs(ctx, container.ID, "root", "setcap", "cap_net_admin+ep", coderPathInsideContainer); err != nil {
+		logger.Warn(ctx, "set CAP_NET_ADMIN on agent binary failed", slog.Error(err))
+	}


This will probably fail unless the container is running as privileged or has the specific CAP_NET_ADMIN privilege set on the container?

As per the comment, this is an optional networking boost. (See regular agent bootstrap script, I'll update the comment to reference it.) Did you have some action in mind?

We could check for both of these things before trying? Not a blocker though.

Sure, I don't think it's very high priority but let's create a ticket for future enhancement. 👍🏻

coder/internal#683

johnstcn · 2025-06-06T16:37:59Z

agent/agentcontainers/api.go

+	// Make sure the agent binary is executable so we can run it.
+	if _, err := api.ccli.ExecAs(ctx, container.ID, "root", "chmod", "+x", coderPathInsideContainer); err != nil {
+		return xerrors.Errorf("set agent binary executable: %w", err)
+	}


Do we also need to chown the binary so that it's readable by the default container user?

Good callout. I didn't consider this but docker cp seems to follow the permissions of the file on disk. So unless we chown it could be nonsense within the container (non-existent user, etc).

It's unlikely that the permissions will be bad for the user (typically 0755), but we could improve it for sure. It might make sense to turn this into a script rather than N amount of docker execs.

johnstcn · 2025-06-06T16:38:56Z

agent/agentcontainers/api.go

+
+	logger.Info(ctx, "starting subagent in dev container")
+
+	err := api.dccli.Exec(agentCtx, dc.WorkspaceFolder, dc.ConfigPath, agentPath, []string{"agent"},


Do we try to execute this as a non-root user?

AFAIK this will get executed as the remote user configured by devcontainer.json (or if unconfigured, container user), which seems like the correct behavior to me.

johnstcn · 2025-06-06T16:46:36Z

agent/agentcontainers/api.go

+	injected := make(map[uuid.UUID]bool, len(api.injectedSubAgentProcs))
+	for _, proc := range api.injectedSubAgentProcs {
+		injected[proc.agent.ID] = true
+	}


This could probably be a map[uuid.UUID]struct{} instead, and then below on line 888 just check for _, found := injected[agent.ID]

I don't foresee the memory savings being necessary here (will we have 1000s of sub agents?). The current form reads better and is simpler to use IMO (I always prefer this form for readability where applicable).

johnstcn · 2025-06-06T16:49:20Z

agent/agentcontainers/api.go

+	for _, agent := range agents {
+		if injected[agent.ID] {
+			continue
+		}
+		err := api.subAgentClient.Delete(ctx, agent.ID)
+		if err != nil {
+			api.logger.Error(ctx, "failed to delete agent",
+				slog.Error(err),
+				slog.F("agent_id", agent.ID),
+				slog.F("agent_name", agent.Name),
+			)
+		}
+	}


Should we set an upper bound on deletion attempts and raise if more than say 3 attempts fail?

Are you suggesting silently ignoring failures unless >= 3 fail? Or perhaps adding retry logic?

I'm mainly worried about spamming error logs into the void.

These will be part of the parent agent log 🤔

We can leave it as-is for now, but I think if this does start happening frequently (or all the time) it may be difficult to catch if it just goes into the parent agent log.

mafredri · 2025-06-09T16:08:17Z

@DanielleMaywood @johnstcn I've added WithContainerLabelIncludeFilter to filter out injection in tests and prevent them from interfering with non-test dev containers.

I also added WithSubAgentEnv to update the autostart integration test in agent package. It now verifies that a sub agent is started as well.

johnstcn · 2025-06-10T08:11:04Z

agent/agent_test.go

+	token := os.Getenv("CODER_AGENT_TOKEN")
+	if url == "" || token == "" {
+		_, _ = fmt.Fprintln(os.Stderr, "CODER_AGENT_URL and CODER_AGENT_TOKEN must be set")
+		return 10


Can we name these specific status codes as something more meaningful to human eyes?

They don't really have a meaning, just something to differentiate the states and started at 10 since I got tired of bumping everything as I added more stuff 😅, the println should hopefully be helpful here.

johnstcn · 2025-06-10T08:12:07Z

agent/agent_test.go

+		}
+		defer r.Body.Close()
+
+		t.Logf("Sub-agent request payload received: %+v", payload)


suggestion: do we perhaps want to allow the caller to run some function against the paylaod?

We send it on the channel and do some verification already 👍🏻

johnstcn · 2025-06-10T08:13:00Z

agent/agent_test.go

+			// The agent will copy "itself", but in the case of this test, the
+			// agent is actually this test binary. So we'll tell the test binary
+			// to execute the sub-agent main function via this env.
+			agentcontainers.WithSubAgentEnv("CODER_TEST_RUN_SUB_AGENT_MAIN=1"),


mafredri · 2025-06-10T09:17:31Z

One last addendum, implemented a quick 'n dirty pwd check to get the directory using devcontainer exec. Noticed the hard-coded path wasn't really working out in many cases.

github-actions bot assigned mafredri Jun 5, 2025

This was referenced Jun 5, 2025

chore(agent): update agent proto client #18242

Merged

feat(agent/agentcontainers): refactor Lister to ContainerCLI and implement new methods #18243

Merged

feat(agent/agentcontainers): add Exec method to devcontainers CLI #18244

Merged

mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-3 branch from d49f84e to 011a8aa Compare June 5, 2025 12:51

mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-4 branch from 91ff08e to 3960774 Compare June 5, 2025 12:52

mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-3 branch from 011a8aa to 63f93bc Compare June 5, 2025 13:59

mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-4 branch from 3960774 to 1cf1905 Compare June 5, 2025 13:59

mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-3 branch from 63f93bc to 0deaab8 Compare June 6, 2025 08:44

mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-4 branch from 1cf1905 to f190036 Compare June 6, 2025 08:44

mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-3 branch from 0deaab8 to 8796ba3 Compare June 6, 2025 09:30

mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-4 branch 2 times, most recently from dc146ab to d1447f3 Compare June 6, 2025 09:45

mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-3 branch from 8796ba3 to adbfd45 Compare June 6, 2025 11:20

mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-4 branch from d1447f3 to 3547372 Compare June 6, 2025 11:27

Base automatically changed from mafredri/feat-agent-devcontainer-injection-3 to main June 6, 2025 11:39

feat(agent/agentcontainers): implement sub agent injection

7358ee0

This change adds support for sub agent creation and injection into dev containers. Closes coder/internal#621

mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-4 branch from 3547372 to 7358ee0 Compare June 6, 2025 11:39

mafredri added 3 commits June 6, 2025 12:06

implement sub agent url

34aa574

improve doc on container workspace folder, add todo

7a3c8a3

fix coderd and cli tests

eb29bba

mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-4 branch from a8e4495 to eb29bba Compare June 6, 2025 15:59

fix

aa42ab8

mafredri marked this pull request as ready for review June 6, 2025 16:14

johnstcn reviewed Jun 6, 2025

View reviewed changes

mafredri added 3 commits June 9, 2025 08:49

skip test on win

fb4cdad

fix review comments

cf17cd4

ensure agent binary permissions owner/o+rx

780483b

mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-4 branch from 466bc6b to 780483b Compare June 9, 2025 09:30

johnstcn approved these changes Jun 9, 2025

View reviewed changes

DanielleMaywood approved these changes Jun 9, 2025

View reviewed changes

mafredri added 2 commits June 9, 2025 11:14

update cap net admin comment

934a222

implement fake agent api sub agent methods

56c7ceb

mafredri mentioned this pull request Jun 9, 2025

Check dev container (container) properties before attempting to modify CAP_NET_ADMIN coder/internal#683

Open

mafredri added 4 commits June 9, 2025 11:27

do not set workspace folder if container id

591a9bf

add WithContainerLabelIncludeFilter

9afa5ea

add sub agent env and revert container id change

050177b

add sub agent as part of autostart integration test

d5eb3fc

mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-4 branch from abe9116 to d5eb3fc Compare June 9, 2025 16:04

fixup! add sub agent env and revert container id change

1629bee

mafredri added 2 commits June 9, 2025 16:27

fixup! add sub agent as part of autostart integration test

67ee0c5

fixup! fixup! add sub agent env and revert container id change

757dc85

DanielleMaywood approved these changes Jun 10, 2025

View reviewed changes

johnstcn approved these changes Jun 10, 2025

View reviewed changes

use the correct directory for the sub agent

dc7f7c3

DanielleMaywood approved these changes Jun 10, 2025

View reviewed changes

mafredri merged commit fca9917 into main Jun 10, 2025
31 checks passed

mafredri deleted the mafredri/feat-agent-devcontainer-injection-4 branch June 10, 2025 09:37

github-actions bot locked and limited conversation to collaborators Jun 10, 2025


		logger.Info(ctx, "starting subagent in dev container")

		err := api.dccli.Exec(agentCtx, dc.WorkspaceFolder, dc.ConfigPath, agentPath, []string{"agent"},

feat(agent/agentcontainers): implement sub agent injection #18245

feat(agent/agentcontainers): implement sub agent injection #18245

Uh oh!

Conversation

mafredri commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mafredri commented Jun 6, 2025

Uh oh!

johnstcn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mafredri commented Jun 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mafredri commented Jun 10, 2025

Uh oh!

Uh oh!

Uh oh!

mafredri commented Jun 5, 2025 •

edited

Loading