Skip to content

Prevent agents from being killed in CPU or memory-constrained workspaces #8517

@bpmct

Description

@bpmct

Background

When Coder users execute high-demand tasks, such as large builds, it can overwhelm the workspace (if the workspace is undersized). This overload can result in the agent disconnecting, leaving users unable to access their workspace until a restart. While a Coder admin should increase the workspace resources as the long-term solution, we can mitigate the occurrence of agent disconnections, where users are forced to restart their workspace. With v1, we've integrated certain protective measures to reduce this risk.

CPU

When the Coder agent runs on Linux, we could set NICE levels to help keep the agent alive. This should allow developers to run a taxing build or process on their workspace while still having confidence they won't lose connection to their workspace.

Memory

Is there a way to help the Coder agent avoid being OOMKilled in a workspace? From basic reading, it seems like we could set memory limits or oom_score_adj.

OS considerations

If it is not trivial to add support for all operating systems that agents can run on (macOS, Windows, Linux), let's only add this feature for Linux agents since that is what 99% of our active users are on. We can re-evaluate this down the road.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions