Skip to content

Commit f110af3

Browse files
committed
rearrage observability subheadings
1 parent 296d821 commit f110af3

File tree

1 file changed

+29
-28
lines changed

1 file changed

+29
-28
lines changed

docs/tutorials/best-practices/scale-coder.md

Lines changed: 29 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,36 @@ deployment.
2424
- Metrics
2525
- Capture infrastructure metrics like CPU, memory, open files, and network I/O for all
2626
Coder Server, external provisioner daemon, workspace proxy, and PostgreSQL instances.
27+
- Capture Coder Server and External Provisioner daemons metrics [via Prometheus](#how-to-capture-coder-server-metrics-with-prometheus).
2728

28-
### Capture Coder server metrics with Prometheus
29+
Retain metric time series for at least six months. This allows you to see
30+
performance trends relative to user growth.
31+
32+
For a more comprehensive overview, integrate metrics with an observability
33+
dashboard like [Grafana](../../admin/monitoring/index.md).
34+
35+
### Observability key metrics
36+
37+
Configure alerting based on these metrics to ensure you surface problems before
38+
they affect the end-user experience.
39+
40+
- CPU and Memory Utilization
41+
- Monitor the utilization as a fraction of the available resources on the instance.
42+
43+
Utilization will vary with use throughout the course of a day, week, and longer timelines.
44+
Monitor trends and pay special attention to the daily and weekly peak utilization.
45+
Use long-term trends to plan infrastructure upgrades.
46+
47+
- Tail latency of Coder Server API requests
48+
- High tail latency can indicate Coder Server or the PostgreSQL database is underprovisioned
49+
for the load.
50+
- Use the `coderd_api_request_latencies_seconds` metric.
51+
52+
- Tail latency of database queries
53+
- High tail latency can indicate the PostgreSQL database is low in resources.
54+
- Use the `coderd_db_query_latencies_seconds` metric.
55+
56+
### How to capture Coder server metrics with Prometheus
2957

3058
Edit your Helm `values.yaml` to capture metrics from Coder Server and external provisioner daemons with
3159
[Prometheus](../../admin/integrations/prometheus.md):
@@ -56,33 +84,6 @@ Edit your Helm `values.yaml` to capture metrics from Coder Server and external p
5684
CODER_PROMETHEUS_COLLECT_AGENT_STATS=false
5785
```
5886

59-
Retain metric time series for at least six months. This allows you to see
60-
performance trends relative to user growth.
61-
62-
For a more comprehensive overview, integrate metrics with an observability
63-
dashboard like [Grafana](../../admin/monitoring/index.md).
64-
65-
### Observability key metrics
66-
67-
Configure alerting based on these metrics to ensure you surface problems before
68-
they affect the end-user experience.
69-
70-
- CPU and Memory Utilization
71-
- Monitor the utilization as a fraction of the available resources on the instance.
72-
73-
Utilization will vary with use throughout the course of a day, week, and longer timelines.
74-
Monitor trends and pay special attention to the daily and weekly peak utilization.
75-
Use long-term trends to plan infrastructure upgrades.
76-
77-
- Tail latency of Coder Server API requests
78-
- High tail latency can indicate Coder Server or the PostgreSQL database is underprovisioned
79-
for the load.
80-
- Use the `coderd_api_request_latencies_seconds` metric.
81-
82-
- Tail latency of database queries
83-
- High tail latency can indicate the PostgreSQL database is low in resources.
84-
- Use the `coderd_db_query_latencies_seconds` metric.
85-
8687
## Coder Server
8788

8889
### Locality

0 commit comments

Comments
 (0)