Skip to content

Prometheus metrics endpoint returns 500 instead of metrics #11451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
janLo opened this issue Jan 5, 2024 · 5 comments · Fixed by #11508
Closed

Prometheus metrics endpoint returns 500 instead of metrics #11451

janLo opened this issue Jan 5, 2024 · 5 comments · Fixed by #11508
Assignees
Labels
s1 Bugs that break core workflows. Only humans may set this.

Comments

@janLo
Copy link
Contributor

janLo commented Jan 5, 2024

Our instances stopped to respond to metric requests:

❯ curl http://coder-service2.xxx:2112/
An error has occurred while serving metrics:

132 error(s) occurred:
* collected metric "agent_sessions_errors_total" { label:{name:"agent_name"  value:"main"}  label:{name:"error_type"  value:"output_io_copy"}  label:{name:"magic_type"  value:"ssh"}  label:{name:"pty"  value:"yes"}  label:{name:"template_name"  value:"debian-like-buildenv"}  label:{name:"username"  value:"xxx"}  label:{name:"workspace_name"  value:"muts"}  counter:{value:11}} was collected before with the same name and label values
* collected metric "agent_sessions_total" { label:{name:"agent_name"  value:"main"}  label:{name:"magic_type"  value:"ssh"}  label:{name:"pty"  value:"no"}  label:{name:"template_name"  value:"debian-like-buildenv"}  label:{name:"username"  value:"xxx"}  label:{name:"workspace_name"  value:"muts"}  counter:{value:6}} was collected before with the same name and label values
* collected metric "agent_sessions_total" { label:{name:"agent_name"  value:"main"}  label:{name:"magic_type"  value:"ssh"}  label:{name:"pty"  value:"yes"}  label:{name:"template_name"  value:"debian-like-buildenv"}  label:{name:"username"  value:"xxx"}  label:{name:"workspace_name"  value:"muts"}  counter:{value:18}} was collected before with the same name and label values
[...]

It seem to have happened with the upgrade to 2.6.0.

@cdr-bot cdr-bot bot added the bug label Jan 5, 2024
@janLo janLo changed the title Prometheus metrics endpoint resturns 500 instead of metrics Prometheus metrics endpoint returns 500 instead of metrics Jan 5, 2024
@johnstcn
Copy link
Member

johnstcn commented Jan 5, 2024

@janLo do you have CODER_PROMETHEUS_COLLECT_AGENT_STATS=true? I think you should be able to set this to false while we investigate the root cause.

@janLo
Copy link
Contributor Author

janLo commented Jan 5, 2024

Thank you, I'll try that on monday. One thing to note: I updated from 2.5.0.

@spikecurtis spikecurtis self-assigned this Jan 8, 2024
@spikecurtis spikecurtis added the s1 Bugs that break core workflows. Only humans may set this. label Jan 8, 2024
@Emyrk
Copy link
Member

Emyrk commented Jan 8, 2024

@spikecurtis are you working on this, or do you want me to take this up?

spikecurtis added a commit that referenced this issue Jan 9, 2024
Fixes #11451

A refactor of the Agent API passes metrics as protobufs, which include pointers to label name/value pairs.  The aggregator tested for sameness by doing a shallow compare of label values, which for different stats reports would compare unequal because the pointers would be different.

This fix does a deep compare.

While testing I also noted that we neglect to compare template names. This is unlikely to have caused any issue in practice, since the combination of username/workspace is unique, but in the context of comparing metric labels we should do the comparison.

If a user creates a workspace, deletes it, then recreates from a different template, we could in principle have reported incorrect stats for the old template.
@janLo
Copy link
Contributor Author

janLo commented Jan 10, 2024

Thank you, looking forward to the release!
❤️

@johnstcn
Copy link
Member

released in 2.7.0 https://github.com/coder/coder/releases/tag/v2.7.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s1 Bugs that break core workflows. Only humans may set this.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants