Multi-replica networking breaks if using external DERP and CODER_DERP_SERVER_RELAY_URL is unset #10810

spikecurtis · 2023-11-21T05:57:47Z

In license entitlements/enablements calculation, we use the CODER_DERP_SERVER_RELAY_URL to determine whether to enable the multi-replica (high availability) variant of the tailnet coordinator.

When Coder is deployed with an external DERP server and the embedded relay is disabled, then this variable doesn't sound like it needs to be set.

spikecurtis · 2023-11-21T06:15:24Z

I think we did it this way to avoid an explicit enablement of the HA feature.

The tailnet coordinators don't depend on replicasync, so we could conditionally disable replicasync & derpmesh based on CODER_DERP_SERVER_RELAY_URL, but still enable HA coordinators.

One option is to just unconditionally enable the HA feature, so you'd get the HA coordinator if you're licensed for it.

The in-memory, non-HA coordinator probably has lower latency than the PG Coordinator, since we have to query the database, so enterprise customers might want to disable it for single-replica deployments, but we could start by default-enabling the HA coordinator and add support later to disable it if anyone complains. Latency setting up connections matters, but I don't believe the coordinator contributes significantly at this point for reasonable postgres round-trip-time.

fixes #10810 The tailnet coordinators don't depend on replicasync, so we can still enable HA coordinators even if the relay URL is unset. The in-memory, non-HA coordinator probably has lower latency than the PG Coordinator, since we have to query the database, so enterprise customers might want to disable it for single-replica deployments, but this PR default-enables the HA coordinator. We could add support later to disable it if anyone complains. Latency setting up connections matters, but I don't believe the coordinator contributes significantly at this point for reasonable postgres round-trip-time.

fixes coder#10810 The tailnet coordinators don't depend on replicasync, so we can still enable HA coordinators even if the relay URL is unset. The in-memory, non-HA coordinator probably has lower latency than the PG Coordinator, since we have to query the database, so enterprise customers might want to disable it for single-replica deployments, but this PR default-enables the HA coordinator. We could add support later to disable it if anyone complains. Latency setting up connections matters, but I don't believe the coordinator contributes significantly at this point for reasonable postgres round-trip-time.

spikecurtis added s2 Broken use cases or features (with a workaround). Only humans may set this. bug networking Area: networking labels Nov 21, 2023

spikecurtis self-assigned this Nov 21, 2023

spikecurtis mentioned this issue Nov 22, 2023

fix: enable FeatureHighAvailability if it is licensed #10834

Merged

spikecurtis closed this as completed in #10834 Nov 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-replica networking breaks if using external DERP and CODER_DERP_SERVER_RELAY_URL is unset #10810

Multi-replica networking breaks if using external DERP and CODER_DERP_SERVER_RELAY_URL is unset #10810

spikecurtis commented Nov 21, 2023

spikecurtis commented Nov 21, 2023

Uh oh!

Multi-replica networking breaks if using external DERP and CODER_DERP_SERVER_RELAY_URL is unset #10810

Multi-replica networking breaks if using external DERP and CODER_DERP_SERVER_RELAY_URL is unset #10810

Comments

spikecurtis commented Nov 21, 2023

spikecurtis commented Nov 21, 2023

Uh oh!