Skip to content

CODER_PROC_PRIO_MGMT puts noise in logs when processes exit before renicing #14651

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aaronlehmann opened this issue Sep 11, 2024 · 8 comments · Fixed by #14655
Closed

CODER_PROC_PRIO_MGMT puts noise in logs when processes exit before renicing #14651

aaronlehmann opened this issue Sep 11, 2024 · 8 comments · Fixed by #14655
Assignees
Labels
observability Issues related to observability (metrics, dashboards, alerts, opentelemetry)

Comments

@aaronlehmann
Copy link
Contributor

CODER_PROC_PRIO_MGMT seems to have a race when processes exit before the agent has a chance to renice it. This leads to a lot of noise in workspace logs:

2024-09-11 22:28:12.826 [warn]  unable to get proc niceness  cmd=""  pid=7899 ...
2024-09-11 22:28:12.827 [info]  error= get priority for "":
2024-09-11 22:28:12.827 [info]  github.com/coder/coder/v2/agent/agentproc.(*Process).Niceness
2024-09-11 22:28:12.827 [info]  /go/pkg/mod/github.com/coder/coder/v2@v2.14.3/agent/agentproc/proc_unix.go:107
2024-09-11 22:28:12.827 [info]  - get priority:
2024-09-11 22:28:12.827 [info]  github.com/coder/coder/v2/agent/agentproc.UnixSyscaller.GetPriority
2024-09-11 22:28:12.827 [info]  /go/pkg/mod/github.com/coder/coder/v2@v2.14.3/agent/agentproc/syscaller_unix.go:30
2024-09-11 22:28:12.827 [info]  - no such process
@coder-labeler coder-labeler bot added bug observability Issues related to observability (metrics, dashboards, alerts, opentelemetry) labels Sep 11, 2024
@sreya sreya self-assigned this Sep 12, 2024
@sreya
Copy link
Collaborator

sreya commented Sep 12, 2024

@aaronlehmann how often do these logs show up? There's some logic to suggest that these should be limited to one a minute. Just making sure there isn't more than one bug to fix.

@aaronlehmann
Copy link
Contributor Author

Here's what I see when I run:

while true; do sleep 0.0001 ; done
2024-09-12 17:14:13.420 [erro]  manage process priority ...
2024-09-12 17:14:13.431 [info]  error= list:
2024-09-12 17:14:13.431 [info]  github.com/coder/coder/v2/agent.(*agent).manageProcessPriority
2024-09-12 17:14:13.431 [info]  /go/pkg/mod/github.com/coder/coder/v2@v2.14.3/agent/agent.go:1652
2024-09-12 17:14:13.431 [info]  - read cmdline:
2024-09-12 17:14:13.431 [info]  github.com/coder/coder/v2/agent/agentproc.List
2024-09-12 17:14:13.431 [info]  /go/pkg/mod/github.com/coder/coder/v2@v2.14.3/agent/agentproc/proc_unix.go:52
2024-09-12 17:14:13.431 [info]  - read /proc/8644/cmdline: no such process
2024-09-12 17:14:23.421 [warn]  unable to get proc niceness  cmd=""  pid=9549 ...
2024-09-12 17:14:23.430 [info]  error= get priority for "":
2024-09-12 17:19:29.430 [info]  github.com/coder/coder/v2/agent/agentproc.(*Process).Niceness
2024-09-12 17:19:29.430 [info]  /go/pkg/mod/github.com/coder/coder/v2@v2.14.3/agent/agentproc/proc_unix.go:107
2024-09-12 17:19:29.430 [info]  - get priority:
2024-09-12 17:19:29.430 [info]  github.com/coder/coder/v2/agent/agentproc.UnixSyscaller.GetPriority
2024-09-12 17:19:29.430 [info]  /go/pkg/mod/github.com/coder/coder/v2@v2.14.3/agent/agentproc/syscaller_unix.go:30
2024-09-12 17:19:29.430 [info]  - no such process
2024-09-12 17:20:36.421 [warn]  unable to get proc niceness  cmd="sleep 0.0001 "  pid=82473 ...
2024-09-12 17:20:36.423 [info]  error= get priority for "sleep\x000.0001\x00":
2024-09-12 17:20:36.423 [info]  github.com/coder/coder/v2/agent/agentproc.(*Process).Niceness
2024-09-12 17:20:36.423 [info]  /go/pkg/mod/github.com/coder/coder/v2@v2.14.3/agent/agentproc/proc_unix.go:107
2024-09-12 17:20:36.423 [info]  - get priority:
2024-09-12 17:20:36.423 [info]  github.com/coder/coder/v2/agent/agentproc.UnixSyscaller.GetPriority
2024-09-12 17:20:36.423 [info]  /go/pkg/mod/github.com/coder/coder/v2@v2.14.3/agent/agentproc/syscaller_unix.go:30
2024-09-12 17:20:36.423 [info]  - no such process
2024-09-12 17:21:51.420 [warn]  unable to get proc niceness  cmd="sleep 0.0001 "  pid=152231 ...
2024-09-12 17:21:51.426 [info]  error= get priority for "sleep\x000.0001\x00":
2024-09-12 17:21:51.426 [info]  github.com/coder/coder/v2/agent/agentproc.(*Process).Niceness
2024-09-12 17:21:51.426 [info]  /go/pkg/mod/github.com/coder/coder/v2@v2.14.3/agent/agentproc/proc_unix.go:107
2024-09-12 17:21:51.426 [info]  - get priority:
2024-09-12 17:21:51.426 [info]  github.com/coder/coder/v2/agent/agentproc.UnixSyscaller.GetPriority
2024-09-12 17:21:51.426 [info]  /go/pkg/mod/github.com/coder/coder/v2@v2.14.3/agent/agentproc/syscaller_unix.go:30
2024-09-12 17:21:51.426 [info]  - no such process
2024-09-12 17:22:05.420 [erro]  manage process priority ...
2024-09-12 17:22:05.422 [info]  error= list:
2024-09-12 17:22:05.422 [info]  github.com/coder/coder/v2/agent.(*agent).manageProcessPriority
2024-09-12 17:22:05.422 [info]  /go/pkg/mod/github.com/coder/coder/v2@v2.14.3/agent/agent.go:1652
2024-09-12 17:22:05.423 [info]  - read oom_score_adj:
2024-09-12 17:22:05.423 [info]  github.com/coder/coder/v2/agent/agentproc.List
2024-09-12 17:22:05.423 [info]  /go/pkg/mod/github.com/coder/coder/v2@v2.14.3/agent/agentproc/proc_unix.go:61
2024-09-12 17:22:05.423 [info]  - read /proc/165262/oom_score_adj: no such process

There isn't much else actively logging to the workspace log, so this tends to be the biggest source of logs even if it it is rate-limited.

@aaronlehmann
Copy link
Contributor Author

@sreya: Not sure if 2.15.3 is supposed to have the complete set of fixes for this, but with that release I'm seeing a lot of messages like this:


2024-10-14 18:05:44.106 [warn]  unable to set oom_score_adj  cmd="bash "  pid=59723  score=900  error=
2024-10-14 18:06:40.107 [warn]  unable to get proc niceness  cmd="/sbin/init "  pid=1  error=
2024-10-14 18:07:40.107 [warn]  unable to get proc niceness  cmd="@dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only "  pid=156  error=
2024-10-14 18:08:41.107 [warn]  unable to get proc niceness  cmd="/sbin/init "  pid=1  error=
2024-10-14 18:09:41.107 [warn]  unable to get proc niceness  cmd="/sbin/init "  pid=1  error=
2024-10-14 18:10:41.107 [warn]  unable to get proc niceness  cmd="/home/coder/.vscode-server/cli/servers/Stable-d78a74bcdfad14d5d3b1b782f87255d802b57511/server/node /home/coder/.vscode-server/cli/servers/Stable-d78a74bcdfad14d5d3b1b782f87255d802b57511/server/out/bootstrap-fork --type=ptyHost --logsPath /home/coder/.vscode-server/data/logs/20241014T151313 "  pid=668  error=
2024-10-14 18:11:42.107 [warn]  unable to get proc niceness  cmd="/sbin/init "  pid=1  error=
2024-10-14 18:12:43.107 [warn]  unable to get proc niceness  cmd="/sbin/init "  pid=1  error=
2024-10-14 18:13:44.106 [warn]  unable to get proc niceness  cmd="/sbin/init "  pid=1  error=
2024-10-14 18:14:44.106 [warn]  unable to get proc niceness  cmd="/sbin/init "  pid=1  error=
2024-10-14 18:15:45.107 [warn]  unable to get proc niceness  cmd="/sbin/init "  pid=1  error=
2024-10-14 18:16:46.107 [warn]  unable to get proc niceness  cmd="/sbin/init "  pid=1  error=
2024-10-14 18:17:46.107 [warn]  unable to get proc niceness  cmd="/sbin/init "  pid=1  error=
2024-10-14 18:18:46.107 [warn]  unable to get proc niceness  cmd="/lib/systemd/systemd-journald "  pid=86  error=

@sreya
Copy link
Collaborator

sreya commented Oct 15, 2024

@aaronlehmann 🤦 I missed something dumb. Do you want a fix now or would you be open to wait a couple of weeks? I'm going to be redoing this feature from the ground up and it'll eliminate any issues related to spam.

@sreya
Copy link
Collaborator

sreya commented Oct 15, 2024

@aaronlehmann I merged #15071 in case you want to opt for the patch

@aaronlehmann
Copy link
Contributor Author

No worries... as a fellow Go programmer, I've definitely made this same mistake more than once.

If it's not a lot of work to make a patch release, or you end up doing a 2.15.4 anyway for other reasons, I think this fix would be nice to have. I'd also be interested in getting #14678 if there is an upcoming patch release.

@sreya
Copy link
Collaborator

sreya commented Oct 15, 2024

Let me check with @stirby, I'll go ahead and get it cherry-picked in case we do 👍

@stirby
Copy link
Collaborator

stirby commented Oct 15, 2024

@aaronlehmann I expect us to patch 2.15 next week with a few fixes, adding this to the list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
observability Issues related to observability (metrics, dashboards, alerts, opentelemetry)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants