We use cookies to make your experience better.
WebRTC (Web Real-Time Communication) enables delivery of audio- and video-conferencing applications using native web technologies, but a lesser-known feature is that it can tunnel arbitrary data as well. This is the story of how and why we migrated our networking from a traditional reverse proxy architecture over to WebRTC, and what we learned in the process.
Coder orchestrates development environments on your existing Kubernetes infrastructure, whether self-hosted or in the cloud. As with many cloud-native applications, Coder relied on a reverse proxy, which we called the “envproxy,” to route traffic from outside the cluster into running workspaces. For illustrative purposes, you can think of Coder as a control plane (the service we call “coderd”, which provides a dashboard for users to start and stop their workspaces) and a data plane (the envproxy and workspace itself).
In this context, the data plane refers to components that are in the critical path of the developer’s workflow, meaning that any failure would cause a disruption to a user’s development process. The proxy architecture meant that there was a central point for us to enforce access controls (authentication and authorization checks) and perform audit logging, but had some undesirable consequences as well:
While this architecture was straightforward to implement and served us well in our early stages, we started to encounter more complexity in our customers’ use cases that caused a poor user experience. Multiple points of failure meant that things could fail without being clearly diagnosable, and sometimes the failures would not be apparent to users and administrators. Customers occasionally had to modify network policies to suit their use case, which required them to understand Coder’s implementation details (such as the envproxy and workspace agent). In order to support a wide range of customer use cases and demands, we needed to embrace a distributed systems architecture to improve performance and reliability.
At Coder, we eat our own cooking, so many of our engineers use SSH forwarding to tunnel PostgreSQL traffic or pass through an SSH agent socket from their local workstations to their cloud workspace. While this approach generally works well, the envproxy would always introduce a potential failure mode and inhibit upgrades, since shutting down gracefully means that we would need to wait until all in-flight connections close. We knew that a peer-to-peer architecture would resolve our challenges, while also providing a secure-by-default installation and reducing effort for system operators.
We have long been fans of the fantastic WireGuard protocol, since it solves the same fundamental challenge that we face with developer workspaces: securely providing end-to-end connectivity, across untrusted and unreliable networks, supporting a mesh operation mode and network roaming. Unfortunately, browsers do not include native support for WireGuard, and implementing it would be non-trivial, since browsers do not allow transmission of arbitrary UDP or even TCP traffic. While we could potentially compile WireGuard to WebAssembly and encapsulate it over a WebSocket connection, this would require us to implement a proxy on the receiving end as well. Given these challenges, we knew we needed to pursue another path.
Our friends at Discord use the WebRTC protocol to stream media in real time, and we realized that Coder has similar requirements in terms of latency, security, browser and device support, and compatibility with diverse networking configurations. If we could tunnel arbitrary protocols and connections over WebRTC, then we can leverage the existing ecosystem and capabilities pioneered by browser vendors and innovative teams such as Discord. Moreover, since browsers provide built-in WebRTC APIs, we would be able to modify our open source code-server project to use it as an underlying transport, providing an even faster editing experience, completely transparently. Better still, we would be able to provide end-to-end encryption and minimize latency on local networks, even behind gateways using Network Address Translation (NAT), through WebRTC-compatible technologies including STUN, TURN relays, and DTLS, which we will discuss later in this article.
With our new networking model, Coder uses WebRTC to broker connections between the user and services running inside their workspace. In order to maximize compatibility and provide a clean migration path, our initial implementation uses the excellent (and open source) Pion TURN relay server project. The relay provides a rendezvous point that is accessible to both the user and the workspace they are trying to access, and which may be located either inside or outside of the Kubernetes cluster, as long as it is able to receive inbound connections from both:
An agent running inside the workspace establishes an outbound connection to the relay service, and the user connects to the same relay with a token to authorize the connection to the endpoint. While both the user and the workspace must be able to connect to the relay, it is not a requirement for the workspace container to connect directly to the user’s workstation or for the user’s workstation to connect directly to the workspace container, which simplifies network administration. This approach means that:
While we believe that our new approach to networking will yield significant improvements to the experience of installing and operating a Coder deployment, we’re just getting started. In future releases, we want to explore:
In Coder’s 1.21 release, and as a fallback when direct connectivity is not possible due to network conditions, traffic will flow through the TURN relay as depicted on the left. In future releases, users will be able to connect directly to their workspaces, relying on the STUN service to broker the connection and traverse NAT gateways. As a result, the relay will not be part of the data plane and interruptions to the STUN/TURN relay process will not affect connectivity to the workspace.
We’re excited to bring the beginning of these improvements to Coder 1.21. We’ll be talking about our networking improvements on this week’s edition of Coffee and Coder - join us on Twitch with any questions you have for us and to learn more.
Enjoy what you read?
Subscribe to our newsletter
By signing up, you agree to our Privacy Policy and Terms of service.