feat: implement agent process management #9461

sreya · 2023-08-31T00:12:05Z

An opt-in feature has been added to the agent to allow
deprioritizing non coder-related processes for CPU by setting their
niceness level to 10.
Opting in to the feature requires setting CODER_PROC_MEMNICE_ENABLE to a non-empty value.

- An opt-in feature has been added to the agent to allow deprioritizing non coder-related processes for both CPU and memory. Non coder processes have their niceness set to 10 and their oom_score_adj set to 100

ammario · 2023-09-09T19:24:10Z

Why is it opt-in? Seems like a healthy default. Also, could remove a configuration knob.

There's also implicit configuration in that the system can enable / disable priority management via the capabilities it gives the agent.

ammario · 2023-09-09T19:27:04Z

Also: aiming to have a review for this by tomorrow afternoon.

ammario

I'm confused about why we're processing all processes vs. just the agent PID itself? In theory niceness and oom_score is supposed to be relative—processes that want higher priority should be able to get it without knowledge of every other process.

The current approach whereby we're iterating all processes has scalability issues too. For example, if the agent runs in a large process namespace where it only controls a small number of processes, it would generate immense log spam.

agent/agent.go

agent/agentproc/syscaller_other.go

agent/agentproc/proc.go

agent/agentproc/proc_test.go

agent/agent.go

ammario · 2023-09-10T19:49:10Z

agent/agent.go

+		name := filepath.Base(proc.Name())
+		// If the process is prioritized we should adjust
+		// it's oom_score_adj and avoid lowering its niceness.
+		if slices.Contains(prioritizedProcs, name) {


We want to specifically prioritize the agent and not other coder processes right? If I'm reading this code correctly it would treat coder server and coder stat the same as the agent.

This is a good catch, I don't see that as being a big deal but we can be more discriminate about which processes we want to prioritize by also parsing command arguments. WDYT?

why not just check if its the current process?

agent/agent.go

sreya · 2023-09-11T20:23:00Z

I'm confused about why we're processing all processes vs. just the agent PID itself? In theory niceness and oom_score is supposed to be relative—processes that want higher priority should be able to get it without knowledge of every other process.

The current approach whereby we're iterating all processes has scalability issues too. For example, if the agent runs in a large process namespace where it only controls a small number of processes, it would generate immense log spam.

This implementation was adopted from v1 since it is re-implementing functionality requested from a v1 customer. I imagine it was implemented this way because decreasing the niceness score of a process requires CAP_SYS_NICE, whereas increasing it can be done without any additional capabilities. Coder admins are not always sysadmins meaning they may not have the ability to provide additional capabilities.

Why is it opt-in? Seems like a healthy default. Also, could remove a configuration knob.

I figured introducing this sort of functionality is safer to do opt-in, especially because interested parties only need to add a single env var to their template to enable it. If there's any unforeseen issues with the feature we avoid a serious regression by enabling it by default for every customer as opposed to the few who are specifically requesting this feature. Eventually I'd like to promote it to enabled by default once we're confident there aren't any surprises.

ammario · 2023-09-12T05:53:48Z

I'm confused about why we're processing all processes vs. just the agent PID itself? In theory niceness and oom_score is supposed to be relative—processes that want higher priority should be able to get it without knowledge of every other process.

The current approach whereby we're iterating all processes has scalability issues too. For example, if the agent runs in a large process namespace where it only controls a small number of processes, it would generate immense log spam.

This implementation was adopted from v1 since it is re-implementing functionality requested from a v1 customer. I imagine it was implemented this way because decreasing the niceness score of a process requires CAP_SYS_NICE, whereas increasing it can be done without any additional capabilities. Coder admins are not always sysadmins meaning they may not have the ability to provide additional capabilities.

Why is it opt-in? Seems like a healthy default. Also, could remove a configuration knob.

I figured introducing this sort of functionality is safer to do opt-in, especially because interested parties only need to add a single env var to their template to enable it. If there's any unforeseen issues with the feature we avoid a serious regression by enabling it by default for every customer as opposed to the few who are specifically requesting this feature. Eventually I'd like to promote it to enabled by default once we're confident there aren't any surprises.

SGTM

ammario · 2023-09-13T21:42:06Z

agent/agent.go

+		name := filepath.Base(proc.Name())
+		// If the process is prioritized we should adjust
+		// it's oom_score_adj and avoid lowering its niceness.
+		if slices.Contains(prioritizedProcs, name) {


why not just check if its the current process?

agent/agent_test.go

deansheather

Looks good to me, but Ammar's comment about only prioritizing based on PID should be added before merge

sreya · 2023-09-14T20:20:27Z

why not just check if its the current process?

In the future we may want to be able to prioritize other processes such as sshd which is why it's implemented the way it is, but we can just do it pid-based for now.

This reverts commit 5020eb4.

github-actions bot assigned sreya Aug 31, 2023

sreya force-pushed the jon/agentproc branch from d795bf2 to ba137c1 Compare September 1, 2023 00:11

sreya added 7 commits September 8, 2023 22:24

feat: implement agent process management

4ed4069

- An opt-in feature has been added to the agent to allow deprioritizing non coder-related processes for both CPU and memory. Non coder processes have their niceness set to 10 and their oom_score_adj set to 100

improve process detection

7e59db6

add agentproc tests

8c65216

some minor agent tests

760cbb3

add a proper test for proc management

f4b864e

custom nice

cbcb854

refactor into build files

8230247

sreya force-pushed the jon/agentproc branch from fb73f8c to 8230247 Compare September 8, 2023 22:24

sreya added 2 commits September 8, 2023 22:45

tick tock

3e1defd

make fmt

2fe9c70

sreya requested a review from ammario September 8, 2023 22:49

ammario requested changes Sep 10, 2023

View reviewed changes

pr comments

ef41e9a

sreya marked this pull request as ready for review September 12, 2023 22:39

sreya added 5 commits September 12, 2023 22:44

lint

05baba0

whoops

478d57c

skip non-linux

0ced5ce

skip non-linux

8aaa6d5

prevent race

cea4851

sreya requested review from ammario and deansheather September 13, 2023 16:09

ammario approved these changes Sep 13, 2023

View reviewed changes

deansheather reviewed Sep 14, 2023

View reviewed changes

sreya added 6 commits September 14, 2023 20:30

only prioritize ourselves

5020eb4

Revert "only prioritize ourselves"

11ab047

This reverts commit 5020eb4.

only prioritize coder agent

46ef05a

remove oom_score_adj

d132480

defer first

04ee5cb

avoid resetting niceness for already niced procs

ffbeab9

sreya merged commit 7311ffb into main Sep 15, 2023

sreya deleted the jon/agentproc branch September 15, 2023 00:45

github-actions bot locked and limited conversation to collaborators Sep 15, 2023

feat: implement agent process management #9461

feat: implement agent process management #9461

Uh oh!

Conversation

sreya commented Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ammario commented Sep 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ammario commented Sep 9, 2023

Uh oh!

ammario left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ammario Sep 10, 2023

Choose a reason for hiding this comment

Uh oh!

sreya Sep 11, 2023

Choose a reason for hiding this comment

Uh oh!

ammario Sep 13, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sreya commented Sep 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ammario commented Sep 12, 2023

Uh oh!

ammario Sep 13, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deansheather left a comment

Choose a reason for hiding this comment

Uh oh!

sreya commented Sep 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sreya commented Aug 31, 2023 •

edited

Loading

ammario commented Sep 9, 2023 •

edited

Loading

sreya commented Sep 11, 2023 •

edited

Loading

sreya commented Sep 14, 2023 •

edited

Loading