Skip to content

Commit da81080

Browse files
committed
add suggestions from review
1 parent 30a6207 commit da81080

File tree

1 file changed

+16
-11
lines changed

1 file changed

+16
-11
lines changed

docs/tutorials/best-practices/scale-coder.md

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@ end-user experience and measure the effects of modifications you make to your
1616
deployment.
1717

1818
- Log output
19-
- Capture log output from Loki, CloudWatch logs, and other tools on your Coder Server
20-
instances and external provisioner daemons and store them in a searchable log store.
19+
- Capture log output from from Coder Server instances and external provisioner daemons
20+
and store them in a searchable log store like Loki, CloudWatch logs, or other tools.
2121
- Retain logs for a minimum of thirty days, ideally ninety days.
2222
This allows you to look back to see when anomalous behaviors began.
2323

@@ -42,13 +42,15 @@ Edit your Helm `values.yaml` to capture metrics from Coder Server and external p
4242
CODER_PROMETHEUS_COLLECT_DB_METRICS=true
4343
```
4444

45-
1. Configure agent stats to avoid large cardinality:
45+
1. For a high scale deployment, configure agent stats to avoid large cardinality or disable them:
4646

47-
```yaml
48-
CODER_PROMETHEUS_AGGREGATE_AGENT_STATS_BY=agent_name
49-
```
47+
- Configure agent stats:
48+
49+
```yaml
50+
CODER_PROMETHEUS_AGGREGATE_AGENT_STATS_BY=agent_name
51+
```
5052

51-
- To disable agent stats:
53+
- Disable agent stats:
5254

5355
```yaml
5456
CODER_PROMETHEUS_COLLECT_AGENT_STATS=false
@@ -68,10 +70,13 @@ they affect the end-user experience.
6870
- CPU and Memory Utilization
6971
- Monitor the utilization as a fraction of the available resources on the instance.
7072

71-
Utilization will vary with use throughout the course of a day, week, and longer timelines. Monitor trends and pay special attention to the daily and weekly peak utilization. Use long-term trends to plan infrastructure upgrades.
73+
Utilization will vary with use throughout the course of a day, week, and longer timelines.
74+
Monitor trends and pay special attention to the daily and weekly peak utilization.
75+
Use long-term trends to plan infrastructure upgrades.
7276

7377
- Tail latency of Coder Server API requests
74-
- High tail latency can indicate Coder Server or the PostgreSQL database is low on resources.
78+
- High tail latency can indicate Coder Server or the PostgreSQL database is underprovisioned
79+
for the load.
7580
- Use the `coderd_api_request_latencies_seconds` metric.
7681

7782
- Tail latency of database queries
@@ -82,8 +87,8 @@ they affect the end-user experience.
8287

8388
### Locality
8489

85-
To ensure increased availability of the Coder API, deploy at least three
86-
instances. Spread the instances across nodes with anti-affinity rules in
90+
If increased availability of the Coder API is a concern, deploy at least three
91+
instances of Coder Server. Spread the instances across nodes with anti-affinity rules in
8792
Kubernetes or in different availability zones of the same geographic region.
8893

8994
Do not deploy in different geographic regions.

0 commit comments

Comments
 (0)