Skip to content

SSH to workspace not working anymore with Cloudflare proxying enabled #9337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jaulz opened this issue Aug 25, 2023 · 31 comments
Closed

SSH to workspace not working anymore with Cloudflare proxying enabled #9337

jaulz opened this issue Aug 25, 2023 · 31 comments
Assignees
Labels
networking Area: networking s2 Broken use cases or features (with a workaround). Only humans may set this.

Comments

@jaulz
Copy link

jaulz commented Aug 25, 2023

From one day to the other I cannot connect to my workspace via SSH anymore. I can SSH into other containers but my main workspace is not connecting. I downgraded, rebooted the whole instance, restarted the workspace but nothing helped though it's indicated as running:
image

Neither the Terminal button (or any other button) nor coder ssh jaulz.workspace is working. Both basically return not even an error message. When I'm checking the logs of the Docker container I can see this output of the agent:

2023-08-25 14:15:56.198 [debu]  net.tailnet.net.wgengine: derphttp.Client.Recv: connecting to derp-999 (coder)
2023-08-25 14:15:56.266 [debu]  net.tailnet.net.wgengine: magicsock: [0xc0003c32c0] derp.Recv(derp-999): derp.Recv: unexpectedly large frame of 1414811695 bytes returned
2023-08-25 14:15:56.266 [debu]  net.tailnet.net.wgengine: derp-999: [v1] backoff: 5697 msec
2023-08-25 14:15:56.278 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: got STUN response for 1000stun0 from 167.235.129.2:57958 (1c14801356802acb4ae1ba97) in 10.998169ms
2023-08-25 14:15:56.303 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: got STUN response for 1001stun0 from 167.235.129.2:57958 (7e84418b23ab8c0e48696dca) in 36.522108ms
2023-08-25 14:15:56.493 [debu]  net.tailnet.net.wgengine: netcheck: [v1] measuring ICMP latency of coder (999): no address for node 999b
2023-08-25 14:15:56.548 [debu]  net.tailnet.net.wgengine: netcheck: [v1] report: udp=true v6=false v6os=false mapvarydest=false hair= portmap= v4a=167.235.129.2:57958 derp=999 derpdist=999v4:19ms
2023-08-25 14:16:01.965 [debu]  net.tailnet.net.wgengine: derphttp.Client.Recv: connecting to derp-999 (coder)
2023-08-25 14:16:02.051 [debu]  net.tailnet.net.wgengine: magicsock: [0xc0003c32c0] derp.Recv(derp-999): derp.Recv: unexpectedly large frame of 1414811695 bytes returned
2023-08-25 14:16:02.051 [debu]  net.tailnet.net.wgengine: derp-999: [v1] backoff: 5150 msec
2023-08-25 14:16:02.063 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: got STUN response for 1000stun0 from 167.235.129.2:57958 (7757ffae0f447c677f87d0fd) in 11.04406ms
2023-08-25 14:16:02.272 [debu]  net.tailnet.net.wgengine: netcheck: [v1] measuring ICMP latency of coder (999): no address for node 999b
2023-08-25 14:16:02.320 [debu]  net.tailnet.net.wgengine: netcheck: [v1] report: udp=true v6=false v6os=false mapvarydest= hair= portmap= v4a=167.235.129.2:57958 derp=999 derpdist=999v4:23ms
2023-08-25 14:16:02.321 [debu]  net.tailnet: netinfo callback  netinfo="NetInfo{varies= hairpin= ipv6=false ipv6os=false udp=true icmpv4=false derp=#999 portmap= link=\"\"}"
2023-08-25 14:16:02.321 [debu]  net.tailnet: sending node  node="&{ID:nodeid:4be00bc6922401af AsOf:2023-08-25 14:16:02.321403 +0000 UTC Key:nodekey:30b321849632e6672891c1696bf0ac5dd4785b8f4bc8b80fe26a54f5e64d7b63 DiscoKey:discokey:4d5786f5d6f02e50372f6dc0da59bd31874763f61a76d68ff40458721c5d4f18 PreferredDERP:999 DERPLatency:map[999-v4:0.023000846] DERPForcedWebsocket:map[] Addresses:[fd7a:115c:a1e0:4270:b41f:f439:6ddb:fe51/128 fd7a:115c:a1e0:49d6:b259:b7ac:b1b2:48f4/128] AllowedIPs:[fd7a:115c:a1e0:4270:b41f:f439:6ddb:fe51/128 fd7a:115c:a1e0:49d6:b259:b7ac:b1b2:48f4/128] Endpoints:[167.235.129.2:57958 192.168.176.5:57958]}"
2023-08-25 14:16:07.202 [debu]  net.tailnet.net.wgengine: derphttp.Client.Recv: connecting to derp-999 (coder)
2023-08-25 14:16:07.268 [debu]  net.tailnet.net.wgengine: magicsock: [0xc0003c32c0] derp.Recv(derp-999): derp.Recv: unexpectedly large frame of 1414811695 bytes returned
2023-08-25 14:16:07.268 [debu]  net.tailnet.net.wgengine: derp-999: [v1] backoff: 4659 msec
2023-08-25 14:16:07.306 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: got STUN response for 1001stun0 from 167.235.129.2:57958 (5ebfbbea8624a955074f5a09) in 36.667137ms
2023-08-25 14:16:07.328 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: got STUN response for 1002stun0 from 167.235.129.2:57958 (b23e39419eeceb4b133924f2) in 59.409467ms
2023-08-25 14:16:07.503 [debu]  net.tailnet.net.wgengine: netcheck: [v1] measuring ICMP latency of coder (999): no address for node 999b
2023-08-25 14:16:07.569 [debu]  net.tailnet.net.wgengine: netcheck: [v1] report: udp=true v6=false v6os=false mapvarydest=false hair= portmap= v4a=167.235.129.2:57958 derp=999 derpdist=999v4:23ms
2023-08-25 14:16:07.570 [debu]  net.tailnet: netinfo callback  netinfo="NetInfo{varies=false hairpin= ipv6=false ipv6os=false udp=true icmpv4=false derp=#999 portmap= link=\"\"}"
2023-08-25 14:16:07.570 [debu]  net.tailnet: sending node  node="&{ID:nodeid:4be00bc6922401af AsOf:2023-08-25 14:16:07.570705 +0000 UTC Key:nodekey:30b321849632e6672891c1696bf0ac5dd4785b8f4bc8b80fe26a54f5e64d7b63 DiscoKey:discokey:4d5786f5d6f02e50372f6dc0da59bd31874763f61a76d68ff40458721c5d4f18 PreferredDERP:999 DERPLatency:map[999-v4:0.023359301] DERPForcedWebsocket:map[] Addresses:[fd7a:115c:a1e0:4270:b41f:f439:6ddb:fe51/128 fd7a:115c:a1e0:49d6:b259:b7ac:b1b2:48f4/128] AllowedIPs:[fd7a:115c:a1e0:4270:b41f:f439:6ddb:fe51/128 fd7a:115c:a1e0:49d6:b259:b7ac:b1b2:48f4/128] Endpoints:[167.235.129.2:57958 192.168.176.5:57958]}"

Is maybe this error unexpectedly large frame of 1414811695 bytes returned causing any issues?

The workspace itself is created via this template:

resource "docker_container" "workspace" {
  count    = data.coder_workspace.me.start_count
  image    = docker_image.workspace.image_id
  # Uses lower() to avoid Docker restriction on container names.
  name     = "${data.coder_parameter.namespace.value}-${data.coder_workspace.me.owner}-${lower(data.coder_workspace.me.name)}-workspace"
  hostname = lower(data.coder_workspace.me.name)
  dns      = ["1.1.1.1"]
  memory   = 10000
  memory_swap   = -1
  # Use the docker gateway if the access URL is 127.0.0.1
  command  = ["sh", "-c", replace(coder_agent.workspace.init_script, "/localhost|127\\.0\\.0\\.1/", "host.docker.internal")]
  env      = [
    "CODER_AGENT_TOKEN=${coder_agent.workspace.token}"
  ]
  host {
    host = "host.docker.internal"
    ip   = "host-gateway"
  }
  volumes {
    container_path = "/home/coder/"
    volume_name    = docker_volume.home.name
    read_only      = false
  }
  networks_advanced {
    name = docker_network.internal.name
  }
  host {
    host = "${data.coder_parameter.workspace_host.value}"
    ip   = "127.0.0.1"
  }
  restart = "unless-stopped"
}

Has anything changed during recent updates which could cause this misbehaviour? Thanks a lot for your feedback!

@cdr-bot cdr-bot bot added the bug label Aug 25, 2023
@zpantskhava
Copy link

Same here.

Starting today, I'm experiencing the same issue. I'm unable to connect to my workspace using either VSCode or the Jetbrains Gateway.

curl -X GET https://mycoder.uri/api/v2/debug/health -H 'Accept: application/json' -H 'Coder-Session-Token: czZF56msxH-NOD4anc2wec5nx6YjeHXai'

"999": {
"healthy": false,
"region": {
"EmbeddedRelay": true,
"RegionID": 999,
"RegionCode": "coder",
"RegionName": "Coder Embedded Relay",
"Nodes": [
{
"Name": "999b",
"RegionID": 999,
"HostName": "mycoder.uri",
"STUNPort": -1,
"DERPPort": 443
}
]
},
"node_reports": [
{
"healthy": false,
"node": {
"Name": "999b",
"RegionID": 999,
"HostName": "mycoder.uri",
"STUNPort": -1,
"DERPPort": 443
},
"node_info": {
"TokenBucketBytesPerSecond": 0,
"TokenBucketBytesBurst": 0
},
"can_exchange_messages": false,
"round_trip_ping": "",
"round_trip_ping_ms": 0,
"uses_websocket": false,
"client_logs": [
[
"derphttp.Client.Connect: connecting to https://mycoder.uri/derp"
],
[
"derphttp.Client.Connect: connecting to https://mycoder.uri/derp"
]
],
"client_errs": [
[
"recv derp message: derp.Recv: unexpectedly large frame of 1414811695 bytes returned"
],
[]
],
"error": null,
"stun": {
"Enabled": false,
"CanSTUN": false,
"Error": null
}
}
],
"error": null
}

netcheck: [v1] measuring ICMP latency of coder (999): no address for node 999b"

So, no clue what happened.

@coadler
Copy link
Contributor

coadler commented Aug 25, 2023

Hi @jaulz and @zpantskhava, could you both confirm what version you're running?

@deansheather
Copy link
Member

Did this happen after a Coder upgrade?

Also, what setup do you use? What clouds do you use and what load balancers, reverse proxies or VPNs sit between you and Coder?

@zpantskhava
Copy link

Hi
@deansheather it happened before upgrade. Then I upgraded to latest version,but still the same.

Only what is working is terminal from web UI.
coder version
Coder v2.1.3+ebd878b Thu Aug 24 17:35:00 UTC 2023
ebd878b

Full build of Coder, supports the server subcommand.

@zpantskhava
Copy link

Did this happen after a Coder upgrade?

Also, what setup do you use? What clouds do you use and what load balancers, reverse proxies or VPNs sit between you and Coder?

I'm using a Self hosted Coder. On-Prem

@zpantskhava
Copy link

zpantskhava commented Aug 25, 2023

coder login https://mycoder.uri/

Paste your token here:
Welcome to Coder, admin! You're authenticated.

coder list
WORKSPACE TEMPLATE STATUS HEALTHY LAST BUILT OUTDATED STARTS AT STOPS AFTER
admin/z base-template-Backend Started true 2m false - 1d

coder config-ssh
No changes to make.

ssh -vvv coder.z.main

Details

OpenSSH_7.4p1, OpenSSL 1.0.2k-fips 26 Jan 2017
debug1: Reading configuration data /root/.ssh/config
debug1: /root/.ssh/config line 18: Applying options for coder.z.main
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 58: Applying options for *
debug1: Executing proxy command: exec /usr/bin/coder --global-config /root/.config/coderv2 ssh --stdio z.main
debug1: permanently_set_uid: 0/0
debug1: permanently_drop_suid: 0
debug1: identity file /root/.ssh/id_rsa type 1
debug1: key_load_public: No such file or directory
debug1: identity file /root/.ssh/id_rsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /root/.ssh/id_dsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /root/.ssh/id_dsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /root/.ssh/id_ecdsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /root/.ssh/id_ecdsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /root/.ssh/id_ed25519 type -1
debug1: key_load_public: No such file or directory
debug1: identity file /root/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_7.4

Note: And it gets stuck here.

Above is what I'm getting from a remote host while trying to connect to the workspace. Below is what I'm getting from the coder host itself:

ssh -vvv coder.z.main
OpenSSH_8.7p1, OpenSSL 3.0.7 1 Nov 2022
debug1: Reading configuration data /root/.ssh/config
debug1: /root/.ssh/config line 14: Applying options for coder.z.main
debug1: Reading configuration data /etc/ssh/ssh_config
debug3: /etc/ssh/ssh_config line 55: Including file /etc/ssh/ssh_config.d/50-redhat.conf depth 0
debug1: Reading configuration data /etc/ssh/ssh_config.d/50-redhat.conf
debug2: checking match for 'final all' host coder.z.main originally coder.z.main
debug3: /etc/ssh/ssh_config.d/50-redhat.conf line 3: not matched 'final'
debug2: match not found
debug3: /etc/ssh/ssh_config.d/50-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1 (parse only)
debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
debug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-]
debug3: kex names ok: [curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512]
debug1: configuration requests final Match pass
debug1: re-parsing configuration
debug1: Reading configuration data /root/.ssh/config
debug1: /root/.ssh/config line 14: Applying options for coder.z.main
debug1: Reading configuration data /etc/ssh/ssh_config
debug3: /etc/ssh/ssh_config line 55: Including file /etc/ssh/ssh_config.d/50-redhat.conf depth 0
debug1: Reading configuration data /etc/ssh/ssh_config.d/50-redhat.conf
debug2: checking match for 'final all' host coder.z.main originally coder.z.main
debug3: /etc/ssh/ssh_config.d/50-redhat.conf line 3: matched 'final'
debug2: match found
debug3: /etc/ssh/ssh_config.d/50-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1
debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
debug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-]
debug3: kex names ok: [curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512]
debug1: Executing proxy command: exec /usr/bin/coder --global-config /root/.config/coderv2 ssh --stdio z.main
debug1: identity file /root/.ssh/id_rsa type -1
debug1: identity file /root/.ssh/id_rsa-cert type -1
debug1: identity file /root/.ssh/id_dsa type -1
debug1: identity file /root/.ssh/id_dsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa type -1
debug1: identity file /root/.ssh/id_ecdsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa_sk type -1
debug1: identity file /root/.ssh/id_ecdsa_sk-cert type -1
debug1: identity file /root/.ssh/id_ed25519 type -1
debug1: identity file /root/.ssh/id_ed25519-cert type -1
debug1: identity file /root/.ssh/id_ed25519_sk type -1
debug1: identity file /root/.ssh/id_ed25519_sk-cert type -1
debug1: identity file /root/.ssh/id_xmss type -1
debug1: identity file /root/.ssh/id_xmss-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_8.7
debug1: Remote protocol version 2.0, remote software version Go
debug1: compat_banner: no match: Go
debug2: fd 5 setting O_NONBLOCK
debug2: fd 4 setting O_NONBLOCK
debug1: Authenticating to coder.z.main:22 as 'root'
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
debug3: order_hostkeyalgs: no algorithms matched; accept original
debug3: send packet: type 20
debug1: SSH2_MSG_KEXINIT sent
debug3: receive packet: type 20
debug1: SSH2_MSG_KEXINIT received
debug2: local client KEXINIT proposal
debug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,ext-info-c
debug2: host key algorithms: ssh-ed25519-cert-v01@openssh.com,ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,sk-ssh-ed25519-cert-v01@openssh.com,sk-ecdsa-sha2-nistp256-cert-v01@openssh.com,rsa-sha2-512-cert-v01@openssh.com,rsa-sha2-256-cert-v01@openssh.com,ssh-ed25519,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,sk-ssh-ed25519@openssh.com,sk-ecdsa-sha2-nistp256@openssh.com,rsa-sha2-512,rsa-sha2-256
debug2: ciphers ctos: aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes256-ctr,aes128-gcm@openssh.com,aes128-ctr
debug2: ciphers stoc: aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes256-ctr,aes128-gcm@openssh.com,aes128-ctr
debug2: MACs ctos: hmac-sha2-256-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha2-256,hmac-sha1,umac-128@openssh.com,hmac-sha2-512
debug2: MACs stoc: hmac-sha2-256-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha2-256,hmac-sha1,umac-128@openssh.com,hmac-sha2-512
debug2: compression ctos: none,zlib@openssh.com,zlib
debug2: compression stoc: none,zlib@openssh.com,zlib
debug2: languages ctos:
debug2: languages stoc:
debug2: first_kex_follows 0
debug2: reserved 0
debug2: peer server KEXINIT proposal
debug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1
debug2: host key algorithms: rsa-sha2-256,rsa-sha2-512,ssh-rsa
debug2: ciphers ctos: aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr
debug2: ciphers stoc: aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr
debug2: MACs ctos: hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1,hmac-sha1-96
debug2: MACs stoc: hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1,hmac-sha1-96
debug2: compression ctos: none
debug2: compression stoc: none
debug2: languages ctos:
debug2: languages stoc:
debug2: first_kex_follows 0
debug2: reserved 0
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: rsa-sha2-512
debug1: kex: server->client cipher: aes256-gcm@openssh.com MAC: compression: none
debug1: kex: client->server cipher: aes256-gcm@openssh.com MAC: compression: none
debug1: kex: curve25519-sha256 need=32 dh_need=32
debug1: kex: curve25519-sha256 need=32 dh_need=32
debug3: send packet: type 30
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug3: receive packet: type 31
debug1: SSH2_MSG_KEX_ECDH_REPLY received
debug1: Server host key: ssh-rsa SHA256:mVQQQ8xIdhx84ns6WugxAQDxYajqqOb6F0oj96oDpCo
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
Warning: Permanently added 'coder.z.main' (RSA) to the list of known hosts.
debug3: send packet: type 21
debug2: set_newkeys: mode 1
debug1: rekey out after 4294967296 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug3: receive packet: type 21
debug1: SSH2_MSG_NEWKEYS received
debug2: set_newkeys: mode 0
debug1: rekey in after 4294967296 blocks
debug1: Will attempt key: /root/.ssh/id_rsa
debug1: Will attempt key: /root/.ssh/id_dsa
debug1: Will attempt key: /root/.ssh/id_ecdsa
debug1: Will attempt key: /root/.ssh/id_ecdsa_sk
debug1: Will attempt key: /root/.ssh/id_ed25519
debug1: Will attempt key: /root/.ssh/id_ed25519_sk
debug1: Will attempt key: /root/.ssh/id_xmss
debug2: pubkey_prepare: done
debug3: send packet: type 5
debug3: receive packet: type 7
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,sk-ssh-ed25519@openssh.com,sk-ecdsa-sha2-nistp256@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,rsa-sha2-256,rsa-sha2-512,ssh-rsa,ssh-dss>
debug3: receive packet: type 6
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug3: send packet: type 50
debug3: receive packet: type 52
Authenticated to coder.z.main (via proxy) using "none".
debug1: pkcs11_del_provider: called, provider_id = (null)
debug1: channel 0: new [client-session]
debug3: ssh_session2_open: channel_new: 0
debug2: channel 0: send open
debug3: send packet: type 90
debug1: Entering interactive session.
debug1: pledge: proc
debug3: receive packet: type 91
debug2: channel_input_open_confirmation: channel 0: callback start
debug2: client_session2_setup: id 0
debug2: channel 0: request pty-req confirm 1
debug3: send packet: type 98
debug2: channel 0: request shell confirm 1
debug3: send packet: type 98
debug2: channel_input_open_confirmation: channel 0: callback done
debug2: channel 0: open confirm rwindow 2097152 rmax 32768
debug3: receive packet: type 99
debug2: channel_input_status_confirm: type 99 id 0
debug2: PTY allocation request accepted on channel 0
debug3: receive packet: type 99
debug2: channel_input_status_confirm: type 99 id 0
debug2: shell request accepted on channel 0

coder@z:~$

Note: I do not generate any SSH keys, neither on the local host nor on the remote server.

@jaulz
Copy link
Author

jaulz commented Aug 27, 2023

@coadler @deansheather it did not happen immediately after a Coder upgrade but instead I think it happened overnight when the workspace was rebooted. I am using the latest version ghcr.io/coder/coder:v2.1.3 on Hetzner behind Cloudfare without any additional VPNs etc. What I also noticed is that coder ssh --verbose test.main does not output any logs. Shouldn't verbose print any messages?

@jaulz
Copy link
Author

jaulz commented Aug 27, 2023

I just tested it with the delivered Docker template and even that template is not working. I see a similar message there (i.e. unexpectedly large frame of 1414811695 bytes returned):

2023-08-27 12:17:55.378 [debu]  net.tailnet.net.wgengine: derphttp.Client.Recv: connecting to derp-999 (coder)
2023-08-27 12:17:55.451 [debu]  net.tailnet.net.wgengine: magicsock: [0xc00031b540] derp.Recv(derp-999): derp.Recv: unexpectedly large frame of 1414811695 bytes returned
2023-08-27 12:17:55.451 [debu]  net.tailnet.net.wgengine: derp-999: [v1] backoff: 6086 msec
2023-08-27 12:17:55.508 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: got STUN response for 1002stun0 from 167.235.129.2:60677 (86de8232cd95792e8c25be55) in 56.938078ms
2023-08-27 12:17:55.663 [debu]  net.tailnet.net.wgengine: netcheck: [v1] measuring ICMP latency of coder (999): no address for node 999b
2023-08-27 12:17:55.715 [debu]  net.tailnet.net.wgengine: netcheck: [v1] report: udp=true v6=false v6os=false mapvarydest= hair= portmap= v4a=167.235.129.2:60677 derp=999 derpdist=999v4:22ms
2023-08-27 12:17:55.715 [debu]  net.tailnet: netinfo callback  netinfo="NetInfo{varies= hairpin= ipv6=false ipv6os=false udp=true icmpv4=false derp=#999 portmap= link=\"\"}"
2023-08-27 12:17:55.716 [debu]  net.tailnet: sending node  node="&{ID:nodeid:4d3a4e2e892a4c3d AsOf:2023-08-27 12:17:55.716165 +0000 UTC Key:nodekey:e07d24494fc311d81f42e417ab053d95e0078987582ef3d1dc36b582cc896258 DiscoKey:discokey:2887dc707c9c7db338127527ee9c6f918f05831eba777416dcbc81ce3fce751e PreferredDERP:999 DERPLatency:map[999-v4:0.022043496] DERPForcedWebsocket:map[] Addresses:[fd7a:115c:a1e0:42fb:b2c5:b1d1:76d5:b3c3/128 fd7a:115c:a1e0:49d6:b259:b7ac:b1b2:48f4/128] AllowedIPs:[fd7a:115c:a1e0:42fb:b2c5:b1d1:76d5:b3c3/128 fd7a:115c:a1e0:49d6:b259:b7ac:b1b2:48f4/128] Endpoints:[167.235.129.2:60677 172.17.0.4:60677]}"

Also, I see that the ports button is loading infinitely:
image

@matifali matifali added the s1 Bugs that break core workflows. Only humans may set this. label Aug 28, 2023
@deansheather
Copy link
Member

This log line is very confusing to us and we've never really seen it before:

2023-08-27 12:17:55.451 [debu]  net.tailnet.net.wgengine: magicsock: [0xc00031b540] derp.Recv(derp-999): derp.Recv: unexpectedly large frame of 1414811695 bytes returned

Since you're using Cloudflare, what happens if you disable Cloudflare proxying (turn off the orange cloud) and try it directly?

@jaulz
Copy link
Author

jaulz commented Aug 28, 2023

@deansheather just tested it but unfortunately it's the very same behaviour. This is by the way the part of the log that repeats every few seconds (while trying to access the Terminal from web or SSH):

2023-08-28 15:16:28.868 [debu]  net.tailnet.net.wgengine: derphttp.Client.Recv: connecting to derp-999 (coder)
2023-08-28 15:16:28.932 [debu]  net.tailnet.net.wgengine: magicsock: [0xc0000e08c0] derp.Recv(derp-999): derp.Recv: unexpectedly large frame of 1414811695 bytes returned
2023-08-28 15:16:28.932 [debu]  net.tailnet.net.wgengine: derp-999: [v1] backoff: 2516 msec
2023-08-28 15:16:28.944 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: got STUN response for 1000stun0 from 167.235.129.2:60328 (c8a223cd63f009ba498a5a2c) in 10.889982ms
2023-08-28 15:16:28.969 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: got STUN response for 1001stun0 from 167.235.129.2:60328 (d7e3f7fe0db4552289873070) in 36.440952ms
2023-08-28 15:16:29.148 [debu]  net.tailnet.net.wgengine: netcheck: [v1] measuring ICMP latency of coder (999): no address for node 999b
2023-08-28 15:16:29.201 [debu]  net.tailnet.net.wgengine: netcheck: [v1] report: udp=true v6=false v6os=false mapvarydest=false hair=false portmap= v4a=167.235.129.5:60328 derp=999 derpdist=999v4:20ms
2023-08-28 15:16:31.449 [debu]  net.tailnet.net.wgengine: derphttp.Client.Recv: connecting to derp-999 (coder)
2023-08-28 15:16:31.517 [debu]  net.tailnet.net.wgengine: magicsock: [0xc0000e08c0] derp.Recv(derp-999): derp.Recv: unexpectedly large frame of 1414811695 bytes returned
2023-08-28 15:16:31.517 [debu]  net.tailnet.net.wgengine: derp-999: [v1] backoff: 3716 msec
2023-08-28 15:16:31.529 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: got STUN response for 1000stun0 from 167.235.129.2:60328 (50916f1e8440243b8cf6ac4c) in 10.953849ms
2023-08-28 15:16:31.742 [debu]  net.tailnet.net.wgengine: netcheck: [v1] measuring ICMP latency of coder (999): no address for node 999b
2023-08-28 15:16:31.792 [debu]  net.tailnet.net.wgengine: netcheck: [v1] report: udp=true v6=false v6os=false mapvarydest= hair=false portmap= v4a=167.235.129.5:60328 derp=999 derpdist=999v4:20ms
2023-08-28 15:16:31.793 [debu]  net.tailnet: netinfo callback  netinfo="NetInfo{varies= hairpin=false ipv6=false ipv6os=false udp=true icmpv4=false derp=#999 portmap= link=\"\"}"
2023-08-28 15:16:31.793 [debu]  net.tailnet: sending node  node="&{ID:nodeid:5828821950d16d19 AsOf:2023-08-28 15:16:31.793489 +0000 UTC Key:nodekey:b42eef46207095330126d157d053482bd6bd3a05b2a89c21f3c10918ab245c2e DiscoKey:discokey:220e37d95b94c725b5bdaa36ae7f24c52195fc9ac77a8c495bbd7b3b338db319 PreferredDERP:999 DERPLatency:map[999-v4:0.019556924] DERPForcedWebsocket:map[] Addresses:[fd7a:115c:a1e0:4e72:a7d7:7de6:af2e:92e7/128 fd7a:115c:a1e0:49d6:b259:b7ac:b1b2:48f4/128] AllowedIPs:[fd7a:115c:a1e0:4e72:a7d7:7de6:af2e:92e7/128 fd7a:115c:a1e0:49d6:b259:b7ac:b1b2:48f4/128] Endpoints:[167.235.129.2:60328 172.17.0.2:60328]}"

The metrics are updating by the way correctly:
image

@zpantskhava
Copy link

Likewise, tried to connect to it directly, but without success.

Here is the log from JetBrains Gateway:

2023-08-29 00:21:43,473 [1713088] INFO - CoderWorkspacesStepView - Configuring Coder CLI...
2023-08-29 00:21:43,474 [1713089] INFO - CoderCLIManager - Replacing existing config block
2023-08-29 00:21:43,474 [1713089] INFO - CoderWorkspacesStepView - Opening IDE and Project Location window for work1.main
2023-08-29 00:21:43,482 [1713097] INFO - CoderLocateRemoteProjectStepView - Connecting with SSH and uploading worker if missing... (attempt 1)
2023-08-29 00:21:43,483 [1713098] INFO - #c.j.g.s.d.i.SshCommandExecutor - Executing on remote host: 'uname -sm'. Parameters: redirectErrorStream=true, usePty=true
2023-08-29 00:21:43,484 [1713099] INFO - net.schmizz.sshj.transport.random.JCERandom - Creating new SecureRandom.
2023-08-29 00:21:43,631 [1713246] INFO - #c.i.s.ProxyCommand - ProxyCommand for coder.work1.main:22: "C:\Users\tst\AppData\Local\coder-gateway\10.10.0.18-3000\coder-windows-amd64.exe" --global-config "C:\Users\tst\AppData\Local\coder-gateway\10.10.0.18-3000\config" ssh --stdio work1.main
2023-08-29 00:21:43,631 [1713246] INFO - #c.i.s.ProxyCommand - Started ProxyCommand for coder.work1.main:22: Process[pid=12784, exitValue="not exited"]
2023-08-29 00:21:43,639 [1713254] INFO - #c.i.s.i.s.sshj - Client identity string: SSH-2.0-IntelliJ__Gateway_GW-232.9559.62__SSHJ_0.35.0

And log from workspace container:

2023-08-28 20:26:47.301 [debu] net.tailnet.net.wgengine: derphttp.Client.Recv: connecting to derp-999 (coder)
2023-08-28 20:26:47.433 [debu] net.tailnet.net.wgengine: magicsock: [0xc000820c80] derp.Recv(derp-999): derp.Recv: unexpectedly large frame of 1414811695 bytes returned
2023-08-28 20:26:47.433 [debu] net.tailnet.net.wgengine: derp-999: [v1] backoff: 4106 msec
2023-08-28 20:26:47.473 [debu] net.tailnet.net.wgengine: netcheck: netcheck.runProbe: got STUN response for 1000stun0 from XXX.XXX.XXX.XXX:54535 (1d877a5204f6222fd4442e9c) in 39.294968ms
2023-08-28 20:26:47.761 [debu] net.tailnet.net.wgengine: netcheck: [v1] measuring ICMP latency of coder (999): no address for node 999b
2023-08-28 20:26:47.820 [debu] net.tailnet.net.wgengine: netcheck: netcheck.runProbe: got STUN response for 1001stun0 from XXX.XXX.XXX.XXX:54535 (f78b4efac11fb8dbbd5ba5ef) in 386.676216ms
2023-08-28 20:26:47.855 [debu] net.tailnet.net.wgengine: netcheck: [v1] report: udp=true v6=false v6os=false mapvarydest=false hair= portmap= v4a=XXX.XXX.XXX.XXX:54535 derp=999 derpdist=999v4:44ms

The HTTPS connection works perfectly fine. I'm able to list workspaces and start/stop them. However, as I mentioned earlier, I can Only SSH into the workspace from the coder host itself.

Where can we check the logs other than:
1) journalctl -u coder.service -b
2) workspace container logs
3) curl -X GET https://mycoder.uri/api/v2/debug/health -H 'Accept: application/json' -H 'Coder-Session-Token: czZF56msxH-NOD4anc2wec5nx6YjeHXai'

Can coder be initiated in "debug" mode with the capability to stream data to a log file?

@spikecurtis
Copy link
Contributor

coder ssh -v -l <logdir> <workspace> will write client side logs to <logdir>

@zpantskhava
Copy link

THanks @spikecurtis

OpenSSH_8.7p1, OpenSSL 3.0.7 1 Nov 2022
debug1: Reading configuration data /root/.ssh/config
debug1: /root/.ssh/config line 14: Applying options for coder.work1.main
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Reading configuration data /etc/ssh/ssh_config.d/50-redhat.conf
debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
debug1: configuration requests final Match pass
debug1: re-parsing configuration
debug1: Reading configuration data /root/.ssh/config
debug1: /root/.ssh/config line 14: Applying options for coder.work1.main
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Reading configuration data /etc/ssh/ssh_config.d/50-redhat.conf
debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
debug1: Executing proxy command: exec /usr/bin/coder --global-config /root/.config/coderv2 ssh --stdio work1.main
debug1: identity file /root/.ssh/id_rsa type -1
debug1: identity file /root/.ssh/id_rsa-cert type -1
debug1: identity file /root/.ssh/id_dsa type -1
debug1: identity file /root/.ssh/id_dsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa type -1
debug1: identity file /root/.ssh/id_ecdsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa_sk type -1
debug1: identity file /root/.ssh/id_ecdsa_sk-cert type -1
debug1: identity file /root/.ssh/id_ed25519 type -1
debug1: identity file /root/.ssh/id_ed25519-cert type -1
debug1: identity file /root/.ssh/id_ed25519_sk type -1
debug1: identity file /root/.ssh/id_ed25519_sk-cert type -1
debug1: identity file /root/.ssh/id_xmss type -1
debug1: identity file /root/.ssh/id_xmss-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_8.7


It becomes stuck at this point.

@spikecurtis
Copy link
Contributor

@zpantskhava it looks like you're sending the output of ssh -vvv coder.z.main --- I'm saying we'd like the output of

coder ssh -v -l /tmp/ z.main

@zpantskhava

This comment was marked as spam.

@jaulz
Copy link
Author

jaulz commented Aug 29, 2023

Yeah, that's basically the same log for me 😞

@jaulz
Copy link
Author

jaulz commented Aug 29, 2023

Okay, it seems that my downgrade didn't work the way I expected it (forgot to pass the --build to docker-compose). After properly resetting the version to 2.1.0 it's partially working again: the ports button is not in an infinite loop anymore and also the Terminal button works. However, when I try to connect from my local machine via SSH it's not connecting and similar logs like above appear in the log file.

@jaulz
Copy link
Author

jaulz commented Aug 30, 2023

@deansheather I set up a completely new server and also removed Cloudfare from the bill and now it seems to work again. Any idea which setting in Cloudfare could cause these issues?

@zpantskhava are you also using Cloudfare?

@zpantskhava
Copy link

Hi @jaulz,

Yes, I am using it. However, during my previous usage, I connected directly to the Coder instance without employing a domain name.
So, here, probably reinstalling resolved your issue, but there's no guarantee that the same problem won't arise again. I'm hopeful that it won't happen again.

As for myself, I'm currently attempting to pinpoint the root cause and remain optimistic for assistance.

@coadler
Copy link
Contributor

coadler commented Sep 1, 2023

Another user reported this in Discord and we were able to fix it by disabling Cloudflare proxying, so I'm fairly confident this is where the issue is.

One way to workaround this for now is to use Tailscale's hosted DERPs. You can do this by setting --derp-config-url https://controlplane.tailscale.com/derpmap/default and --derp-server-enable false. Otherwise, disabling the orange cloud for now seems to fix it.

@0xCiBeR
Copy link

0xCiBeR commented Sep 1, 2023

I can also replicate this behaviour using v2.1.4 & cloudflare. Disabling CF proxy does fix the issue.

@jaulz
Copy link
Author

jaulz commented Sep 2, 2023

I can also replicate this behaviour using v2.1.4 & cloudflare. Disabling CF proxy does fix the issue.

@0xCiBeR cool, where exactly did you disable the proxy?

@0xCiBeR
Copy link

0xCiBeR commented Sep 2, 2023

I can also replicate this behaviour using v2.1.4 & cloudflare. Disabling CF proxy does fix the issue.

@0xCiBeR cool, where exactly did you disable the proxy?

Just turning off the orange cloud on the specific dns records in cloudflare

@amirhmoradi
Copy link

Another user reported this in Discord and we were able to fix it by disabling Cloudflare proxying, so I'm fairly confident this is where the issue is.

One way to workaround this for now is to use Tailscale's hosted DERPs. You can do this by setting --derp-config-url https://controlplane.tailscale.com/derpmap/default and --derp-server-enable false. Otherwise, disabling the orange cloud for now seems to fix it.

I confirm that setting the two options fixed the issue, even keeping Cloudflare proxying active.
I set them via env variables in my docker compose:

     # ...
      # Fix for SSH behind Cloudflare
      CODER_DERP_CONFIG_URL: "https://controlplane.tailscale.com/derpmap/default"
      CODER_DERP_SERVER_ENABLE: "false"
     # ...

@matifali
Copy link
Member

matifali commented Sep 4, 2023

As we have a workaround to make this work. I am moving this from s1 to s2

@matifali matifali removed the s1 Bugs that break core workflows. Only humans may set this. label Sep 4, 2023
@matifali matifali added the s2 Broken use cases or features (with a workaround). Only humans may set this. label Sep 4, 2023
@deansheather deansheather changed the title SSH to workspace not working anymore SSH to workspace not working anymore with Cloudflare proxying enabled Sep 5, 2023
@deansheather deansheather self-assigned this Sep 5, 2023
@deansheather
Copy link
Member

Seems like a change to Coder (probably a change to the tailscale fork) caused Cloudflare proxying to break which is affecting some of our enterprise customers

@zpantskhava
Copy link

Another user reported this in Discord and we were able to fix it by disabling Cloudflare proxying, so I'm fairly confident this is where the issue is.
One way to workaround this for now is to use Tailscale's hosted DERPs. You can do this by setting --derp-config-url https://controlplane.tailscale.com/derpmap/default and --derp-server-enable false. Otherwise, disabling the orange cloud for now seems to fix it.

I confirm that setting the two options fixed the issue, even keeping Cloudflare proxying active. I set them via env variables in my docker compose:

     # ...
      # Fix for SSH behind Cloudflare
      CODER_DERP_CONFIG_URL: "https://controlplane.tailscale.com/derpmap/default"
      CODER_DERP_SERVER_ENABLE: "false"
     # ...

100% works for me as well. I've implemented it within coder.env

CODER_DERP_CONFIG_URL= "https://controlplane.tailscale.com/derpmap/default"
CODER_DERP_SERVER_ENABLE= "false"

Awesome

@spikecurtis spikecurtis added the networking Area: networking label Sep 25, 2023
@spikecurtis
Copy link
Contributor

DERP headers consist of 1 byte of type, followed by 4 bytes (big-endian, unsigned) of length. These logs don't include the type, but 1414811695 translates to [84, 84, 80, 47] in decimal. Which in ASCII/UTF-8 is: TTP/. So, I'm willing to bet the actual data coming over the "derp" connection is an HTTP response, which generally looks something like:

HTTP/1.1 200 OK

We don't know the version or status code from the logs, as the DERP client doesn't get far enough decoding to tell.

The mystery is why the DERP client is operating on HTTP data.

In a normal HTTP upgrade scenario, the client would send something like

GET /derp HTTP/1.1
Connection: upgrade
Upgrade: derp

Then wait for the response like

HTTP/1.1 101 Switching protocols

and then, it would start processing data as the new protocol.

However, tailscale has this "fast start" mechanism where it sets a header Derp-Fast-Start: 1 on the initial request, which tells the server not to bother sending an HTTP response, and then it starts interpreting stuff as DERP immediately. https://github.com/coder/tailscale/blob/c821c9c9966dd8a58f0fb03aca30743f151e57de/derp/derphttp/derphttp_client.go#L559

If Cloudflare (or another proxy) were to strip this header out, or return an HTTP error response, then we'd end up with symptoms like our logs show.

For this to work, the DERP server has to have TLS enabled locally, and it injects an additional "meta certificate" into the ServerHello, which is self signed and contains the DERP public key encoded in the CommonName. That seems to imply that the Cloudflare proxy is leaving this metacert on the protocol even while it decrypts the TLS session from the client.

@spikecurtis spikecurtis self-assigned this Sep 27, 2023
@spikecurtis
Copy link
Contributor

@jaulz @0xCiBeR @zpantskhava I'm so far unable to reproduce this experimenting with Cloudflare. Can any of you share more details about the Cloudflare configuration you have when you're seeing this issue?

In particular,

What TLS/SSL encryption mode are you using for Coder?
What edge certificates are you using?
What (if any) origin certificates are you using?
Any other features (WAF, identity, DDoS, etc.) turned on for the domain?

@ammario
Copy link
Member

ammario commented Oct 9, 2023

TL;DR

Use these environment variables to disable the internal DERP and use Tailscale's official map instead:

      CODER_DERP_CONFIG_URL: "https://controlplane.tailscale.com/derpmap/default"
      CODER_DERP_SERVER_ENABLE: "false"

Due to the obscurity of the DERP protocol, we don't guarantee any support of DERP behind reverse proxies such as Cloudflare.

@ammario ammario closed this as completed Oct 9, 2023
@jaulz
Copy link
Author

jaulz commented Oct 9, 2023

Sorry, I missed this issue but eventually it was just the orange cloud in the DNS settings which hides the IP of the actual server. Since there is a workaround I also agree that it should be enough for the time being. Thanks for your support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
networking Area: networking s2 Broken use cases or features (with a workaround). Only humans may set this.
Projects
None yet
Development

No branches or pull requests

9 participants