Skip to content

agent metadata: add coder stat to identify Kubernetes cgroup usage #7076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
bpmct opened this issue Apr 11, 2023 · 13 comments · Fixed by #8005
Closed
2 tasks done

agent metadata: add coder stat to identify Kubernetes cgroup usage #7076

bpmct opened this issue Apr 11, 2023 · 13 comments · Fixed by #8005
Assignees
Milestone

Comments

@bpmct
Copy link
Member

bpmct commented Apr 11, 2023

Running free or top inside a container does not give accurate usage info and reading cgroup values, particularly for CPUs can be tricky. A coder stat command can help help

  • finish coder stat for CPU, DIsk, Memory in Docker and Kubernetes containers
  • add to our Docker and Kubernetes example templates
@matifali
Copy link
Member

matifali commented Apr 28, 2023

For Docker, Kubernetes this can be done by utilizing cgroups

check

cat /sys/fs/cgroup/cpu.stat
cat /sys/fs/cgroup/memory.current
cat /sys/fs/cgroup/memory.max

Note cgroups paths can be different on different Linux kernel version [see attached screenshot]
https://stackoverflow.com/a/51251148/9183518

image

@matifali
Copy link
Member

@deansheather, would you like to make Pittsburgh and Helsinki identical?

@deansheather
Copy link
Member

If this differs by kernel version then we probably should use some other way to determine the memory usage instead for compatibility

@deansheather
Copy link
Member

I will install the same kernel on Pittsburgh though, but I can't restart it during work hours

@deansheather
Copy link
Member

Nope I can't install the same kernel version because it doesn't install on fossa due to uninstallable dependencies, and it's not worth upgrading the entire server unless we have to IMO.

I could try compiling the kernel myself at some point though

@deansheather
Copy link
Member

The image is jammy but the hosts run different OSes. Only the Pittsburgh region runs the old kernel and 20.04 release.

@matifali
Copy link
Member

I added metadata but most of the devs in us so it doesn't work for them. 😞

@matifali
Copy link
Member

matifali commented May 12, 2023

As I now have a way to show correct memory and cpu usage using cgroups. Do we need to implement coder stat

They are available on dev.coder.com dogfood template in all regions except the Pittsburgh (it's still on kernel 5.4 where the cgroups paths are different)

cc: @ammario @bpmct

image

@sreya sreya self-assigned this May 12, 2023
@bpmct
Copy link
Member Author

bpmct commented May 12, 2023

@sreya will look into this. I'm a bit concerned that, without a coder stat, it's difficult for a user to know which command to use, for which kernel, and the info won't be accurate. If we can provide good docs + a command that doesn't require uncommon dependencies + its easy for users to use, I'm fine with that.

@sreya
Copy link
Collaborator

sreya commented May 19, 2023

@johnstcn expressed interest in doing this so re-assigning to him

@johnstcn
Copy link
Member

@bpmct
Copy link
Member Author

bpmct commented May 23, 2023

coder stat is nice because I assume it can work in any containerized environment (e.g. docker, ECS, GKE autopilot) and does not require metrics-server. However, we may find cases where it is better to get metrics from Kubernetes for accuracy, performance, or availability reasons.

An alternative (or additional enhancement) could be to extend https://github.com/coder/coder-logstream-kube to query the Kubernetes metrics-server and send metadata to the workspace.

@mtojek mtojek modified the milestones: 🧹 Sprint 0, ❓Sprint 1 Jun 12, 2023
@johnstcn
Copy link
Member

Status update: roughly 90% complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants