Skip to content

bug: panic in coder ssh/vscodessh #15616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
johnstcn opened this issue Nov 21, 2024 · 7 comments · Fixed by #15927
Closed

bug: panic in coder ssh/vscodessh #15616

johnstcn opened this issue Nov 21, 2024 · 7 comments · Fixed by #15927
Assignees
Labels
s1 Bugs that break core workflows. Only humans may set this.

Comments

@johnstcn
Copy link
Member

Problem

Multiple users have reported the following stack trace when running coder ssh or coder vscodessh:

panic: runtime error: slice bounds out of range [4:0]

goroutine 111 [running]:
golang.org/x/net/route.parseInetAddr(0x80?, {0xc000682484, 0xc000716700?, 0x794})
	/home/runner/go/pkg/mod/golang.org/x/net@v0.30.0/route/address.go:188 +0x267
golang.org/x/net/route.parseAddrs(0x15, 0x8602060, {0xc000682474, 0x28, 0x7a4})
	/home/runner/go/pkg/mod/golang.org/x/net@v0.30.0/route/address.go:408 +0xdd
golang.org/x/net/route.(*wireFormat).parseRouteMessage(0xc000010048, 0x6e23f8a?, {0xc000682418, 0x84, 0x800})
	/home/runner/go/pkg/mod/golang.org/x/net@v0.30.0/route/route_classic.go:70 +0x2fd
golang.org/x/net/route.ParseRIB(0x1, {0xc000682418?, 0xc00047e720?, 0xc0002a4f50?})
	/home/runner/go/pkg/mod/golang.org/x/net@v0.30.0/route/message.go:55 +0x1b3
tailscale.com/net/netmon.(*darwinRouteMon).Receive(0xc000682408)
	/home/runner/go/pkg/mod/github.com/coder/tailscale@v1.1.1-0.20241003034647-02286e537fc2/net/netmon/netmon_darwin.go:59 +0x68
tailscale.com/net/netmon.(*Monitor).pump(0xc000174480)
	/home/runner/go/pkg/mod/github.com/coder/tailscale@v1.1.1-0.20241003034647-02286e537fc2/net/netmon/netmon.go:250 +0x82
created by tailscale.com/net/netmon.(*Monitor).Start in goroutine 1
	/home/runner/go/pkg/mod/github.com/coder/tailscale@v1.1.1-0.20241003034647-02286e537fc2/net/netmon/netmon.go:190 +0x179

What we know so far:

  • User A reports that it occurred after an "ssh timeout". In this case, the user was connected to a VPN (TCP over UDP)
  • User B reports that it occurs when they are working from home and connect to their corporate VPN, and then attempt to connect to their workspace using Visual Studio Code + Plugin (that is, coder vscodessh). Their log also shows the following line right before the panic (redacted):
      net.wgengine: portmapper: saw UPnP type WANIPConnection1 at http://xxx.xxx.xxx.xxx:1900/uubfk/rootDesc.xml; ROUTER MAKE (ROUTER MODEL)
    
  • In both cases, the stacktrace mentions netmon_darwin.go, implying that this is limited to the MacOS build of the CLI.
  • In both cases, the users reportedly were connected to a VPN.
@johnstcn johnstcn added the s1 Bugs that break core workflows. Only humans may set this. label Nov 21, 2024
@coder-labeler coder-labeler bot added the bug risk Prone to bugs label Nov 21, 2024
@spikecurtis spikecurtis removed the bug risk Prone to bugs label Nov 21, 2024
@spikecurtis
Copy link
Contributor

spikecurtis commented Nov 25, 2024

Tailscale are also hitting it: golang/go#70528

Since they're raising it around the same time we're seeing it in the field, I'm suspicious it's related to a macOS update. The user who reported it is on macOS 15.1, which was released Oct 28th.

@Emyrk
Copy link
Member

Emyrk commented Dec 18, 2024

@johnstcn are you able to reproduce this?

@spikecurtis
Copy link
Contributor

@deansheather found an example Kernel packet that triggers the bug and confirmed that the upstream fix resolves for that packet. We got it from a customer who now say they can no longer reproduce even without a fixed binary.

@annapst
Copy link

annapst commented Dec 30, 2024

Thank you for pushing the upgrade already! I'm also seeing this, every time on coder ping and my IDE drops connection and reconnects every couple mins. I needed to upgrade to Sequoia earlier this week. Looking forward to the release including this fix!

@Emyrk
Copy link
Member

Emyrk commented Dec 30, 2024

@annapst when this commit is merged, it should be resolved: 2bba3d7

I worked with a colleague who had a mac and was unable to reproduce the issue, so we were unable to confirm this does in fact solve it.

If you still see this issue on >v2.18.0, please let us know and reopen this issue.

@annapst
Copy link

annapst commented Dec 30, 2024

Thank you! I see the commit is from last week, so probably need to try a future release to include this?

@Emyrk
Copy link
Member

Emyrk commented Dec 30, 2024

Thank you! I see the commit is from last week, so probably need to try a future release to include this?

Correct, the next release will be v2.19.0. That should have the fix

Emyrk added a commit that referenced this issue Jan 2, 2025
spikecurtis added a commit that referenced this issue Jan 27, 2025
---------------------
chore: migrate to coder/websocket 1.8.12 (#15898)

Migrates us to `coder/websocket` v1.8.12 rather than `nhooyr/websocket` on an older version.

Works around coder/websocket#504 by adding an explicit test for `xerrors.Is(err, io.EOF)` where we were previously getting `io.EOF` from the netConn.
spikecurtis added a commit that referenced this issue Jan 27, 2025
… (#16265)

Cherry-picks #15898 and #15927 to backport the fix to #15616

#15927 is the fix, but it depends on #15898 because of some earlier work
in the tailscale fork to move everything to `coder/websocket`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s1 Bugs that break core workflows. Only humans may set this.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants