Description
Background
When Coder users execute high-demand tasks, such as large builds, it can overwhelm the workspace (if the workspace is undersized). This overload can result in the agent disconnecting, leaving users unable to access their workspace until a restart. While a Coder admin should increase the workspace resources as the long-term solution, we can mitigate the occurrence of agent disconnections, where users are forced to restart their workspace. With v1, we've integrated certain protective measures to reduce this risk.
CPU
When the Coder agent runs on Linux, we could set NICE levels to help keep the agent alive. This should allow developers to run a taxing build or process on their workspace while still having confidence they won't lose connection to their workspace.
Memory
Is there a way to help the Coder agent avoid being OOMKilled in a workspace? From basic reading, it seems like we could set memory limits or oom_score_adj.
OS considerations
If it is not trivial to add support for all operating systems that agents can run on (macOS, Windows, Linux), let's only add this feature for Linux agents since that is what 99% of our active users are on. We can re-evaluate this down the road.