Skip to content

Multi-replica networking breaks if using external DERP and CODER_DERP_SERVER_RELAY_URL is unset #10810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
spikecurtis opened this issue Nov 21, 2023 · 1 comment · Fixed by #10834
Assignees
Labels
networking Area: networking s2 Broken use cases or features (with a workaround). Only humans may set this.

Comments

@spikecurtis
Copy link
Contributor

In license entitlements/enablements calculation, we use the CODER_DERP_SERVER_RELAY_URL to determine whether to enable the multi-replica (high availability) variant of the tailnet coordinator.

When Coder is deployed with an external DERP server and the embedded relay is disabled, then this variable doesn't sound like it needs to be set.

@spikecurtis spikecurtis added s2 Broken use cases or features (with a workaround). Only humans may set this. bug networking Area: networking labels Nov 21, 2023
@spikecurtis
Copy link
Contributor Author

I think we did it this way to avoid an explicit enablement of the HA feature.

The tailnet coordinators don't depend on replicasync, so we could conditionally disable replicasync & derpmesh based on CODER_DERP_SERVER_RELAY_URL, but still enable HA coordinators.

One option is to just unconditionally enable the HA feature, so you'd get the HA coordinator if you're licensed for it.

The in-memory, non-HA coordinator probably has lower latency than the PG Coordinator, since we have to query the database, so enterprise customers might want to disable it for single-replica deployments, but we could start by default-enabling the HA coordinator and add support later to disable it if anyone complains. Latency setting up connections matters, but I don't believe the coordinator contributes significantly at this point for reasonable postgres round-trip-time.

@spikecurtis spikecurtis self-assigned this Nov 21, 2023
spikecurtis added a commit that referenced this issue Nov 22, 2023
fixes #10810

The tailnet coordinators don't depend on replicasync, so we can still enable HA coordinators even if the relay URL is unset.

The in-memory, non-HA coordinator probably has lower latency than the PG Coordinator, since we have to query the database, so enterprise customers might want to disable it for single-replica deployments, but this PR default-enables the HA coordinator.  We could add support later to disable it if anyone complains. Latency setting up connections matters, but I don't believe the coordinator contributes significantly at this point for reasonable postgres round-trip-time.
ericpaulsen pushed a commit to lbi22/coder that referenced this issue Nov 27, 2023
fixes coder#10810

The tailnet coordinators don't depend on replicasync, so we can still enable HA coordinators even if the relay URL is unset.

The in-memory, non-HA coordinator probably has lower latency than the PG Coordinator, since we have to query the database, so enterprise customers might want to disable it for single-replica deployments, but this PR default-enables the HA coordinator.  We could add support later to disable it if anyone complains. Latency setting up connections matters, but I don't believe the coordinator contributes significantly at this point for reasonable postgres round-trip-time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
networking Area: networking s2 Broken use cases or features (with a workaround). Only humans may set this.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant