Skip to content

feat(cli): use coder connect in coder ssh --stdio, if available #17572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 30, 2025

Conversation

ethanndickson
Copy link
Member

@ethanndickson ethanndickson commented Apr 28, 2025

Closes coder/vscode-coder#447
Closes coder/jetbrains-coder#543
Closes coder/coder-jetbrains-toolbox#21

This PR adds Coder Connect support to coder ssh --stdio.

When connecting to a workspace, if --force-new-tunnel is not passed, the CLI will first do a DNS lookup for <agent>.<workspace>.<owner>.<hostname-suffix>. If an IP address is returned, and it's within the Coder service prefix, the CLI will not create a new tailnet connection to the workspace, and instead dial the SSH server running on port 22 on the workspace directly over TCP.

This allows IDE extensions to use the Coder Connect tunnel, without requiring any modifications to the extensions themselves.

Additionally, using_coder_connect is added to the sshNetworkStats file, which the VS Code extension (and maybe Jetbrains?) will be able to read, and indicate to the user that they are using Coder Connect.

One advantage of this approach is that running coder ssh --stdio on an offline workspace with Coder Connect enabled will have the CLI wait for the workspace to build, the agent to connect (and optionally, for the startup scripts to finish), before finally connecting using the Coder Connect tunnel.

As a result, coder ssh --stdio has the overhead of looking up the workspace and agent, and checking if they are running. On my device, this meant coder ssh --stdio <workspace> was approximately a second slower than just connecting to the workspace directly using ssh <workspace>.coder (I would assume anyone serious about their Coder Connect usage would know to just do the latter anyway).

To ensure this doesn't come at a significant performance cost, I've also benchmarked this PR.

Benchmark

Methodology

All tests were completed on dev.coder.com, where a Linux workspace running in AWS us-west1 was created.
The machine running Coder Desktop (the 'client') was a Windows VM running in the same AWS region and VPC as the workspace.

To test the performance of specifically the SSH connection, a port was forwarded between the client and workspace using:

ssh -p 22 -L7001:localhost:7001 <host>

where host was either an alias for an SSH ProxyCommand that called coder ssh, or a Coder Connect hostname.

For latency, tcping was used against the forwarded port:

tcping -n 100 localhost 7001

For throughput, iperf3 was used:

iperf3 -c localhost -p 7001

where an iperf3 server was running on the workspace on port 7001.

Test Cases

Testcase 1: coder ssh ProxyCommand that bicopies from Coder Connect

This case tests the implementation in this PR, such that we can write a config like:

Host codercliconnect
    ProxyCommand /path/to/coder ssh --stdio workspace

With Coder Connect enabled, ssh -p 22 -L7001:localhost:7001 codercliconnect will use the Coder Connect tunnel. The results were as follows:

Throughput, 10 tests, back to back:

  • Average throughput across all tests: 788.20 Mbits/sec
  • Minimum average throughput: 731 Mbits/sec
  • Maximum average throughput: 871 Mbits/sec
  • Standard Deviation: 38.88 Mbits/sec

Latency, 100 RTTs:

  • Average: 0.369ms
  • Minimum: 0.290ms
  • Maximum: 0.473ms

Testcase 2: ssh dialing Coder Connect directly without a ProxyCommand

This is what we assume to be the 'best' way to use Coder Connect

Throughput, 10 tests, back to back:

  • Average throughput across all tests: 789.50 Mbits/sec
  • Minimum average throughput: 708 Mbits/sec
  • Maximum average throughput: 839 Mbits/sec
  • Standard Deviation: 39.98 Mbits/sec

Latency, 100 RTTs:

  • Average: 0.369ms
  • Minimum: 0.267ms
  • Maximum: 0.440ms

Testcase 3: coder ssh ProxyCommand that creates its own Tailnet connection in-process

This is what normally happens when you run coder ssh:

Throughput, 10 tests, back to back:

  • Average throughput across all tests: 610.20 Mbits/sec
  • Minimum average throughput: 569 Mbits/sec
  • Maximum average throughput: 664 Mbits/sec
  • Standard Deviation: 27.29 Mbits/sec

Latency, 100 RTTs:

  • Average: 0.335ms
  • Minimum: 0.262ms
  • Maximum: 0.452ms

Analysis

Performing a two-tailed, unpaired t-test against the throughput of testcases 1 and 2, we find a P value of 0.9450. This suggests the difference between the data sets is not statistically significant. In other words, there is a 94.5% chance that the difference between the data sets is due to chance.

Conclusion

From the t-test, and by comparison to the status quo (regular coder ssh, which uses gvisor, and is noticeably slower), I think it's safe to say any impact on throughput or latency by the ProxyCommand performing a bicopy against Coder Connect is negligible. Users are very much unlikely to run into performance issues as a result of using Coder Connect via coder ssh, as implemented in this PR.

Less scientifically, I ran these same tests on my home network with my Sydney workspace, and both throughput and latency were consistent across testcases 1 and 2.

Copy link
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@@ -203,14 +206,14 @@ func (r *RootCmd) ssh() *serpent.Command {
parsedEnv = append(parsedEnv, [2]string{k, v})
}

deploymentSSHConfig := codersdk.SSHConfigResponse{
cliConfig := codersdk.SSHConfigResponse{
Copy link
Member Author

@ethanndickson ethanndickson Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the name, this was always populated from CLI arguments, which in half of the cases are not the deployment SSH config (i.e. for the VS Code extension it's something like vscode-coder)

Comment on lines -188 to -195
// SSHClient calls SSH to create a client that uses a weak cipher
// to improve throughput.
// SSHClient calls SSH to create a client
func (c *AgentConn) SSHClient(ctx context.Context) (*ssh.Client, error) {
return c.SSHClientOnPort(ctx, AgentSSHPort)
}

// SSHClientOnPort calls SSH to create a client on a specific port
// that uses a weak cipher to improve throughput.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The weak cipher part hasn't been true for years, it used to use arcfour. Updating the comment since it confused me for a moment.

// command via the ProxyCommand SSH option.
networkInfoFilePath := filepath.Join(networkInfoDir, fmt.Sprintf("%d.json", os.Getppid()))
stats := &sshNetworkStats{
UsingCoderConnect: true,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put up a PR for the vscode extension to read this.

@ethanndickson ethanndickson marked this pull request as ready for review April 28, 2025 08:00
@ethanndickson
Copy link
Member Author

ethanndickson commented Apr 28, 2025

The PTY test is failing on Windows, but the PTY functionality works fine in an actual powershell session on Windows, so I'll need to investigate.

@matifali
Copy link
Member

matifali commented Apr 28, 2025

This is great @ethanndickson 🎉

Thanks. Given the PR description I believe that's automatically solved too but would appreciate it if you can test that.

@ethanndickson
Copy link
Member Author

ethanndickson commented Apr 29, 2025

Can you also test the behavior with coder/coder-jetbrains-toolbox#21

I've tested by swapping out the binary path in the SSH config, and specifying a --network-info-dir, and it worked as expected, a file was created with using_coder_connect set to true.

Of note is that because there's currently no way for coder ssh to communicate network stats (such as latency, derp/p2p) with the Jetbrains extensions, there also won't be an indicator that the extension is using Coder Connect.

Copy link
Member

@deansheather deansheather left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this PR checks if it can use coder connect, and if it does runs a different function that doesn't handle special features like forwarding, containers, usage, etc.

You should instead just write some code that determines which mode to use, returns a net.Conn for it (or some similar interface with methods for getting mode, stats, etc.), then keep all of the rest of the code the exact same without any further connection-mode-specific branches.

@ethanndickson ethanndickson force-pushed the ethan/coder-ssh-w-coder-connect branch from c90e471 to be118e6 Compare April 29, 2025 13:25
@ethanndickson
Copy link
Member Author

ethanndickson commented Apr 29, 2025

I made the mistake of increasing the scope of this PR beyond what was necessary for the IDE integrations. Supporting Coder Connect throughout all coder ssh functionality is a big task with lots of sharp edges, and can't be tacked onto this PR.

I spoke with Dean about it, and for now, we'll only support using the Coder Connect tunnel if --stdio is passed. This matches the existing behaviour of coder ssh where features like GPG and Agent forwarding, or connecting straight to a devcontainer, are ignored if --stdio is passed.

@ethanndickson ethanndickson changed the title feat(cli): use coder connect in coder ssh, if available feat(cli): use coder connect in coder ssh --stdio, if available Apr 29, 2025
@matifali
Copy link
Member

matifali commented Apr 29, 2025

This matches the existing behaviour of coder ssh where features like GPG and Agent forwarding, or connecting straight to a devcontainer, are ignored if --stdio is passed.

I think this is the reason we can not do coder/vscode-coder#113

@ethanndickson ethanndickson merged commit 53ba361 into main Apr 30, 2025
35 checks passed
@ethanndickson ethanndickson deleted the ethan/coder-ssh-w-coder-connect branch April 30, 2025 05:17
@github-actions github-actions bot locked and limited conversation to collaborators Apr 30, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Coder Desktop Integration Coder Desktop Integration Coder Desktop Integration
4 participants