|
1 |
| -# Scale Coder |
| 1 | +kl# Scale Coder |
2 | 2 |
|
3 | 3 | December 20, 2024
|
4 | 4 |
|
@@ -61,45 +61,40 @@ For a more comprehensive overview, integrate metrics with an observability dashb
|
61 | 61 |
|
62 | 62 | ### Observability key metrics
|
63 | 63 |
|
| 64 | +Configure alerting based on these metrics to ensure you surface problems before they affect the end-user experience. |
| 65 | + |
64 | 66 | **CPU and Memory Utilization**
|
65 | 67 |
|
66 |
| -- Monitor the utilization as a fraction of the available resources on the |
67 |
| - instance. Its utilization will vary with use throughout the day and over the |
68 |
| - course of the week. Monitor the trends, paying special attention to the daily |
69 |
| - and weekly peak utilization. Use long-term trends to plan infrastructure |
| 68 | +- Monitor the utilization as a fraction of the available resources on the instance. |
| 69 | + |
| 70 | + Utilization will vary with use throughout the course of a day, week, and longer timelines. Monitor trends and pay special attention to the daily and weekly peak utilization. Use long-term trends to plan infrastructure |
70 | 71 | upgrades.
|
71 | 72 |
|
72 | 73 | **Tail latency of Coder Server API requests**
|
73 | 74 |
|
74 |
| -- Use the `coderd_api_request_latencies_seconds` metric. |
75 |
| -- High tail latency can indicate Coder Server or the PostgreSQL database is |
76 |
| - being starved for resources. |
| 75 | +- High tail latency can indicate Coder Server or the PostgreSQL database is low on resources. |
| 76 | + |
| 77 | + Use the `coderd_api_request_latencies_seconds` metric. |
77 | 78 |
|
78 | 79 | **Tail latency of database queries**
|
79 | 80 |
|
80 |
| -- Use the `coderd_db_query_latencies_seconds` metric. |
81 | 81 | - High tail latency can indicate the PostgreSQL database is low in resources.
|
82 | 82 |
|
83 |
| -Configure alerting based on these metrics to ensure you surface problems before |
84 |
| -end users notice them. |
| 83 | + Use the `coderd_db_query_latencies_seconds` metric. |
85 | 84 |
|
86 | 85 | ## Coder Server
|
87 | 86 |
|
88 | 87 | ### Locality
|
89 | 88 |
|
90 |
| -If increased availability of the Coder API is a concern, deploy at least three |
91 |
| -instances. Spread the instances across nodes (e.g. via anti-affinity rules in |
92 |
| -Kubernetes), and/or in different availability zones of the same geographic |
93 |
| -region. |
| 89 | +To ensure increased availability of the Coder API, deploy at least three instances. Spread the instances across nodes with anti-affinity rules in |
| 90 | +Kubernetes or in different availability zones of the same geographic region. |
| 91 | + |
| 92 | +Do not deploy in different geographic regions. |
94 | 93 |
|
95 |
| -Do not deploy in different geographic regions. Coder Servers need to be able to |
96 |
| -communicate with one another directly with low latency, under 10ms. Note that |
97 |
| -this is for the availability of the Coder API – workspaces will not be fault |
98 |
| -tolerant unless they are explicitly built that way at the template level. |
| 94 | +Coder Servers need to be able to |
| 95 | +communicate with one another directly with low latency, under 10ms. Note that this is for the availability of the Coder API. Workspaces are not fault tolerant unless they are explicitly built that way at the template level. |
99 | 96 |
|
100 |
| -Deploy Coder Server instances as geographically close to PostgreSQL as possible. |
101 |
| -Low-latency communication (under 10ms) with Postgres is essential for Coder |
102 |
| -Server's performance. |
| 97 | +Deploy Coder Server instances as geographically close to PostgreSQL as possible. Low-latency communication (under 10ms) with Postgres is essential for Coder Server's performance. |
103 | 98 |
|
104 | 99 | ### Scaling
|
105 | 100 |
|
|
0 commit comments