-
Notifications
You must be signed in to change notification settings - Fork 887
Drill-down view: workspace network latency & disconnects #6724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm going with exposing agent metrics via Prometheus endpoint. |
Status update: We have agent connection/latencies/session stats exposed via Prometheus. The next step would be collecting and exposing details related to agent timeouts. |
Plan for this week: Implement a metrics collector in coderd to aggregate metrics from agents. Let's collect
|
Hey @mafredri! Did you identify any code places where we can inject extra metrics/counters to debug agent timeouts easier? I'm wondering if there are any gaps that could be addressed here. |
@mtojek not really. I still haven’t managed to isolate the problem and can’t really say what metric would help. Although, agent metrics is only one side of the coin, knowing what the client sees could help in such situations. Something I’m doing now is adding logging on the client side. But we probably shouldn’t be sending client metrics to the server, at least not normally. |
There is a significant problem with logging on the agent side. Admins have to ask users to copy, or at least review their logs for issues. We definitely need something centralized, ideally a sink for logs, but I wouldn't mind some extra metrics, even vague ones like |
Battle plan:
|
Related #4680, |
I'm going to detach #7581 from the plan and keep it as a separate issue, hence resolving it. |
Prometheus metrics around network disconnects (+ reason) and user latency in workspaces. Drill down to workspace, IDE, and connection type
The text was updated successfully, but these errors were encountered: