Skip to content

Jetbrains plugin clients are reporting both 0 and 1 for workspace session counts with workspace-usage experiment enabled #15176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
f0ssel opened this issue Oct 21, 2024 · 8 comments
Assignees
Labels
bug risk Prone to bugs observability Issues related to observability (metrics, dashboards, alerts, opentelemetry)

Comments

@f0ssel
Copy link
Contributor

f0ssel commented Oct 21, 2024

If you query the prometheus endpoint while connected to a workspace via the jetbrains plugin you can see the query sometimes return 0 and sometimes return 1. You would expect a solid 1 no matter when in the interval you query.

I think this may be caused by 2 separate reporting loops and possibly a bug in the query where we are selecting the most recent instead of a sum of the most recent.

@f0ssel f0ssel self-assigned this Oct 21, 2024
@coder-labeler coder-labeler bot added bug risk Prone to bugs observability Issues related to observability (metrics, dashboards, alerts, opentelemetry) labels Oct 21, 2024
@f0ssel
Copy link
Contributor Author

f0ssel commented Oct 25, 2024

When connecting with the latest gateway version and coder plugin I'm seeing the following behavior:

  1. Upon opening gateway we see 2 pgcoord connections made and no POSTs to /usage but still getting pushed to the workspace_usage_tracker - This is due to agent stats where connections > 0 getting counted for usage. We need to limit this to just tracked session types.
  2. Once the IDE connection is open we see 2 POST usage loops start about 3 seconds from eachother - IDK why there are 2 jetbrain IDE ssh connections being opened, but technically this is OK for insights data. Need to double check this isn't being counted as a double session anywhere.
  3. After closing the IDE and keeping the gateway splash page open you see 1 of the POST usage loops drop off, and one continue indefinitely. Either jetbrains is keeping one of these sessions around intentionally for some reason or it's a bug. We need to dig into why 2 connections are made and see if we can disable usage tracking on the one that stays open in the background.
  4. After closing the gateway app completely you see the 2nd usage loop drop off and the usage tracker stop shortly after.

So there's 2 problems to solve:

  1. Stop counting agent stats where all session counts are 0 as active usage
  2. Find a way to disable usage tracking for this 2nd "background" ssh connection in jetbrains plugin
[API] 2024-10-25 18:06:54.926 [info]  coderd.pgcoord: serving connection  coordinator_id=3a686752-f69f-418c-bee6-4715c6507e34  peer_id=e79b1bff-46e7-43ac-8d16-9fe4c5cfaddc  name=client  peer_id=e79b1bff-46e7-43ac-8d16-9fe4c5cfaddc
[API] 2024-10-25 18:06:59.472 [info]  coderd.pgcoord: serving connection  coordinator_id=3a686752-f69f-418c-bee6-4715c6507e34  peer_id=31d41232-e2c7-4c1b-b5f0-4b07327a265b  name=client  peer_id=31d41232-e2c7-4c1b-b5f0-4b07327a265b
[API] 2024-10-25 18:07:20.965 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:07:20.962329675Z"
[API] 2024-10-25 18:08:20.963 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:08:20.961945889Z"
[API] 2024-10-25 18:08:21.011 [info]  dbpurge: purged old database entries  duration=5.196059ms
[API] 2024-10-25 18:10:51.581 [info]  coderd.pgcoord: serving connection  coordinator_id=3a686752-f69f-418c-bee6-4715c6507e34  peer_id=4a30d8ed-5a2c-432f-9c4a-12809fdce88d  name=client  peer_id=4a30d8ed-5a2c-432f-9c4a-12809fdce88d
[API] 2024-10-25 18:10:52.038 [info]  coderd: post workspace usage  request_id=2c42b92e-e122-46d3-9657-b2c51bf7eba4  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:10:55.667 [info]  coderd.pgcoord: serving connection  coordinator_id=3a686752-f69f-418c-bee6-4715c6507e34  peer_id=d8682e85-6bdc-4677-9b12-eb5c5c7e5943  name=client  peer_id=d8682e85-6bdc-4677-9b12-eb5c5c7e5943
[API] 2024-10-25 18:10:55.891 [info]  coderd: post workspace usage  request_id=7220e750-18f6-4029-9be1-71d93bd72957  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:11:20.965 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:11:20.962321512Z"
[API] 2024-10-25 18:11:52.183 [info]  coderd: post workspace usage  request_id=f5b8eb3f-83e4-4365-8fb9-a2dcd5e967f7  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:11:56.103 [info]  coderd: post workspace usage  request_id=79517a01-602a-4740-86be-32156032d1b0  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:12:20.964 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:12:20.962459246Z"
[API] 2024-10-25 18:12:56.067 [info]  coderd: post workspace usage  request_id=f5bc7af6-80c9-4dc8-85f1-4ded99f0cc32  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:13:20.964 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:13:20.962026182Z"
[API] 2024-10-25 18:13:21.001 [info]  notifications.report_generator: report generator finished  duration=1.070327ms
[API] 2024-10-25 18:13:55.966 [info]  coderd: post workspace usage  request_id=5f4f0c3c-e91a-405a-9927-e8f195046ae2  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:14:20.962 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:14:20.961975023Z"
[API] 2024-10-25 18:14:55.967 [info]  coderd: post workspace usage  request_id=ef15e0b0-09b9-4a1d-919e-df0cd12cc884  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:15:20.964 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:15:20.961932434Z"
[API] 2024-10-25 18:15:55.964 [info]  coderd: post workspace usage  request_id=36f463e6-0869-4211-9e22-373b5172acfe  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:16:20.963 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:16:20.962812698Z"
[API] 2024-10-25 18:16:55.967 [info]  coderd: post workspace usage  request_id=2931a652-29af-473b-b1f6-d3969b526f65  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:17:20.965 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:17:20.96260274Z"
[API] 2024-10-25 18:17:55.969 [info]  coderd: post workspace usage  request_id=81ec2fdf-d617-435e-83d2-d859b9d92de6  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:18:20.963 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:18:20.96230574Z"
[API] 2024-10-25 18:18:21.021 [info]  dbpurge: purged old database entries  duration=8.243474ms
[API] 2024-10-25 18:18:55.966 [info]  coderd: post workspace usage  request_id=4bcc4e83-0186-40d8-9e3c-8a115fa876c9  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:19:20.965 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:19:20.961962874Z"
[API] 2024-10-25 18:19:55.967 [info]  coderd: post workspace usage  request_id=9fabf5bd-7138-4074-a2db-6c88da6db7de  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:20:20.964 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:20:20.962756136Z"
[API] 2024-10-25 18:20:55.964 [info]  coderd: post workspace usage  request_id=ebe02d70-573f-4ce4-9bc5-f4ee639a228f  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:21:20.964 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:21:20.962307284Z"
[API] 2024-10-25 18:21:55.968 [info]  coderd: post workspace usage  request_id=5ba022eb-af00-42ac-ae55-34287f4a1634  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:22:20.963 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:22:20.962249594Z"
[API] 2024-10-25 18:22:55.964 [info]  coderd: post workspace usage  request_id=f6c643a6-df6e-4c50-98c8-628074556dea  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:23:20.965 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:23:20.962830727Z"
[API] 2024-10-25 18:23:55.964 [info]  coderd: post workspace usage  request_id=048e19db-badc-4046-9140-0c5d5e893704  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:24:20.963 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:24:20.962087806Z"
[API] 2024-10-25 18:24:55.965 [info]  coderd: post workspace usage  request_id=b438689e-2180-4f6b-923b-5521ed2da358  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:25:20.964 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:25:20.962254608Z"
[API] 2024-10-25 18:25:55.962 [info]  coderd: post workspace usage  request_id=5c40217a-42a8-41ec-a4bf-7c60e257bca4  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:26:20.964 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:26:20.962644274Z"
[API] 2024-10-25 18:26:55.963 [info]  coderd: post workspace usage  request_id=2e77918d-f0f2-414c-bf1c-e4ad7ea874c9  workspace_id=3889fe87-9ea4-487f-9b33-2ba149f0e3b3
[API] 2024-10-25 18:27:20.964 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:27:20.962831837Z"
[API] 2024-10-25 18:28:20.964 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-10-25T18:28:20.962146081Z"
[API] 2024-10-25 18:28:21.002 [info]  notifications.report_generator: report generator finished  duration=1.339698ms
[API] 2024-10-25 18:28:21.039 [info]  dbpurge: purged old database entries  duration=6.680897ms

@f0ssel
Copy link
Contributor Author

f0ssel commented Oct 25, 2024

@code-asher Would you be able to help me track down this extra "IDE connection" being made in the jetbrains plugin? Basically there's a few places in the plugin we make SSH connections where we pass --usage-app=disable so that activity tracking for that connection is ignored. I'm thinking there's a place where we are using the main IDE hostname when we should be using the background hostname, or Jetbrains is (maybe accidentally?) making 2 main IDE connections and just leaving one open forever. Hopefully it's the former and we can patch it, if it's the latter we'll need to get Jetbrains help I think.

Here's the gateway logs shows the two IDE connections using the non-background hostname:

2024-10-25 15:00:51,990 [  36862]   INFO - #c.j.g.s.SshPortForwarder - Client /127.0.0.1:53921 connected to persistent channel /127.0.0.1:5990
2024-10-25 15:00:52,019 [  36891]   INFO - net.schmizz.sshj.transport.random.JCERandom - Creating new SecureRandom.
2024-10-25 15:00:52,024 [  36896]   INFO - #c.i.s.ProxyCommand - ProxyCommand for coder-jetbrains--jetbrains1.main--localhost:22: "/Users/f0ssel/Library/Application Support/coder-gateway/localhost-3000/coder-darwin-arm64" --global-config "/Users/f0ssel/Library/Application Support/coder-gateway/localhost-3000/config" --url http://localhost:3000 ssh --stdio --disable-autostart --usage-app=jetbrains admin/jetbrains1.main
2024-10-25 15:00:52,027 [  36899]   INFO - #c.i.s.ProxyCommand - Started ProxyCommand for coder-jetbrains--jetbrains1.main--localhost:22: Process[pid=21669, exitValue="not exited"]
2024-10-25 15:00:52,027 [  36899]   INFO - #c.i.s.i.s.sshj - Client identity string: SSH-2.0-IntelliJ__Gateway_GW-242.23339.42__SSHJ_0.38.1_SNAPSHOT
2024-10-25 15:00:52,914 [  37786]   INFO - #c.i.s.i.s.sshj - Server identity string: SSH-2.0-Go
2024-10-25 15:00:52,998 [  37870]   INFO - #c.i.s.i.s.sshj - Authentication log: SSH connection to coder@coder-jetbrains--jetbrains1.main--localhost:22
 * With altered connection settings:
  - authMethods: PublicKey(privateKeys=[/Users/f0ssel/.ssh/id_rsa, /Users/f0ssel/.ssh/id_ecdsa, /Users/f0ssel/.ssh/id_ecdsa_sk, /Users/f0ssel/.ssh/id_ed25519, /Users/f0ssel/.ssh/id_ed25519_sk, /Users/f0ssel/.ssh/id_xmss, /Users/f0ssel/.ssh/id_dsa], agent=ALL), Password, KeyboardInteractive
  - ciphers: [chacha20-poly1305@openssh.com, aes128-ctr, aes192-ctr, aes256-ctr, aes128-gcm@openssh.com, aes256-gcm@openssh.com]
  - connectTimeout: PT0S
  - environmentVariables: [Inherit(glob=LANG), Inherit(glob=LC_*), Certain(name=CODER_SSH_SESSION_TYPE, value=JetBrains)]
  - hostKeyAlgorithms: [ssh-ed25519-cert-v01@openssh.com, ecdsa-sha2-nistp256-cert-v01@openssh.com, ecdsa-sha2-nistp384-cert-v01@openssh.com, ecdsa-sha2-nistp521-cert-v01@openssh.com, sk-ssh-ed25519-cert-v01@openssh.com, sk-ecdsa-sha2-nistp256-cert-v01@openssh.com, rsa-sha2-512-cert-v01@openssh.com, rsa-sha2-256-cert-v01@openssh.com, ssh-ed25519, ecdsa-sha2-nistp256, ecdsa-sha2-nistp384, ecdsa-sha2-nistp521, sk-ssh-ed25519@openssh.com, sk-ecdsa-sha2-nistp256@openssh.com, rsa-sha2-512, rsa-sha2-256]
  - hostKeyVerifier: OpenSSH-like host key verifier (knownHostsFile=[/dev/null, /private/etc/ssh/ssh_known_hosts, /private/etc/ssh/ssh_known_hosts2], strictHostKeyChecking=NO, hashKnownHosts=false)
  - kexAlgorithms: [sntrup761x25519-sha512@openssh.com, curve25519-sha256, curve25519-sha256@libssh.org, ecdh-sha2-nistp256, ecdh-sha2-nistp384, ecdh-sha2-nistp521, diffie-hellman-group-exchange-sha256, diffie-hellman-group16-sha512, diffie-hellman-group18-sha512, diffie-hellman-group14-sha256]
  - macs: [umac-64-etm@openssh.com, umac-128-etm@openssh.com, hmac-sha2-256-etm@openssh.com, hmac-sha2-512-etm@openssh.com, hmac-sha1-etm@openssh.com, umac-64@openssh.com, umac-128@openssh.com, hmac-sha2-256, hmac-sha2-512, hmac-sha1]
  - proxyConfig: Command(command="/Users/f0ssel/Library/Application Support/coder-gateway/localhost-3000/coder-darwin-arm64" --global-config "/Users/f0ssel/Library/Application Support/coder-gateway/localhost-3000/config" --url http://localhost:3000 ssh --stdio --disable-autostart --usage-app=jetbrains admin/jetbrains1.main)
  - user: coder
 * With auth agent ssh-agent[/private/tmp/com.apple.launchd.RfWkAqq1se/Listeners]
 * Connected: Socket[addr=null,port=0,localport=0]
 * Starting authentication
  => none (success)
2024-10-25 15:00:53,018 [  37890]   INFO - #c.j.g.s.SshPortForwarder - Forwarded from coder-jetbrains--jetbrains1.main--localhost:5990 to local address localhost/127.0.0.1:53932
2024-10-25 15:00:53,019 [  37891]   INFO - #c.j.g.s.SshPortForwarder - Preparing channel to localhost/127.0.0.1:53932
2024-10-25 15:00:53,020 [  37892]   INFO - #c.j.g.s.SshPortForwarder - Intermediate channel to forwarded localhost/127.0.0.1:53932 connected, local address is /127.0.0.1:53933
2024-10-25 15:00:53,023 [  37895]   INFO - #c.j.g.s.SshPortForwarder - Waiting for new connection from client to persistent channel (/127.0.0.1:5990)
2024-10-25 15:00:53,023 [  37895]   INFO - #c.j.g.s.SshPortForwarder - Starting reading loop: /127.0.0.1:53921 -> localhost/127.0.0.1:53932
2024-10-25 15:00:53,023 [  37895]   INFO - #c.j.g.s.SshPortForwarder - Starting reading loop: localhost/127.0.0.1:53932 -> /127.0.0.1:53921
2024-10-25 15:00:55,111 [  39983]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:00:56,004 [  40876]   INFO - net.schmizz.sshj.transport.random.JCERandom - Creating new SecureRandom.
2024-10-25 15:00:56,016 [  40888]   INFO - #c.i.s.ProxyCommand - ProxyCommand for coder-jetbrains--jetbrains1.main--localhost:0: "/Users/f0ssel/Library/Application Support/coder-gateway/localhost-3000/coder-darwin-arm64" --global-config "/Users/f0ssel/Library/Application Support/coder-gateway/localhost-3000/config" --url http://localhost:3000 ssh --stdio --disable-autostart --usage-app=jetbrains admin/jetbrains1.main
2024-10-25 15:00:56,022 [  40894]   INFO - #c.i.s.ProxyCommand - Started ProxyCommand for coder-jetbrains--jetbrains1.main--localhost:0: Process[pid=21685, exitValue="not exited"]
2024-10-25 15:00:56,022 [  40894]   INFO - #c.i.s.i.s.sshj - Client identity string: SSH-2.0-IntelliJ__Gateway_GW-242.23339.42__SSHJ_0.38.1_SNAPSHOT
2024-10-25 15:00:56,646 [  41518]   INFO - CoderRemoteConnectionHandle - Adding GO-242.23339.24 for coder-jetbrains--jetbrains1.main--localhost:/home/admin to recent connections
2024-10-25 15:00:56,861 [  41733]   INFO - #c.i.s.i.s.sshj - Server identity string: SSH-2.0-Go
2024-10-25 15:00:56,954 [  41826]   INFO - #c.i.s.i.s.sshj - Authentication log: SSH connection to coder@coder-jetbrains--jetbrains1.main--localhost:0
 * With altered connection settings:
  - authMethods: PublicKey(privateKeys=[/Users/f0ssel/.ssh/id_rsa, /Users/f0ssel/.ssh/id_ecdsa, /Users/f0ssel/.ssh/id_ecdsa_sk, /Users/f0ssel/.ssh/id_ed25519, /Users/f0ssel/.ssh/id_ed25519_sk, /Users/f0ssel/.ssh/id_xmss, /Users/f0ssel/.ssh/id_dsa], agent=ALL), Password, KeyboardInteractive
  - ciphers: [chacha20-poly1305@openssh.com, aes128-ctr, aes192-ctr, aes256-ctr, aes128-gcm@openssh.com, aes256-gcm@openssh.com]
  - connectTimeout: PT0S
  - environmentVariables: [Inherit(glob=LANG), Inherit(glob=LC_*), Certain(name=CODER_SSH_SESSION_TYPE, value=JetBrains)]
  - hostKeyAlgorithms: [ssh-ed25519-cert-v01@openssh.com, ecdsa-sha2-nistp256-cert-v01@openssh.com, ecdsa-sha2-nistp384-cert-v01@openssh.com, ecdsa-sha2-nistp521-cert-v01@openssh.com, sk-ssh-ed25519-cert-v01@openssh.com, sk-ecdsa-sha2-nistp256-cert-v01@openssh.com, rsa-sha2-512-cert-v01@openssh.com, rsa-sha2-256-cert-v01@openssh.com, ssh-ed25519, ecdsa-sha2-nistp256, ecdsa-sha2-nistp384, ecdsa-sha2-nistp521, sk-ssh-ed25519@openssh.com, sk-ecdsa-sha2-nistp256@openssh.com, rsa-sha2-512, rsa-sha2-256]
  - hostKeyVerifier: OpenSSH-like host key verifier (knownHostsFile=[/dev/null, /private/etc/ssh/ssh_known_hosts, /private/etc/ssh/ssh_known_hosts2], strictHostKeyChecking=NO, hashKnownHosts=false)
  - kexAlgorithms: [sntrup761x25519-sha512@openssh.com, curve25519-sha256, curve25519-sha256@libssh.org, ecdh-sha2-nistp256, ecdh-sha2-nistp384, ecdh-sha2-nistp521, diffie-hellman-group-exchange-sha256, diffie-hellman-group16-sha512, diffie-hellman-group18-sha512, diffie-hellman-group14-sha256]
  - macs: [umac-64-etm@openssh.com, umac-128-etm@openssh.com, hmac-sha2-256-etm@openssh.com, hmac-sha2-512-etm@openssh.com, hmac-sha1-etm@openssh.com, umac-64@openssh.com, umac-128@openssh.com, hmac-sha2-256, hmac-sha2-512, hmac-sha1]
  - port: 0
  - proxyConfig: Command(command="/Users/f0ssel/Library/Application Support/coder-gateway/localhost-3000/coder-darwin-arm64" --global-config "/Users/f0ssel/Library/Application Support/coder-gateway/localhost-3000/config" --url http://localhost:3000 ssh --stdio --disable-autostart --usage-app=jetbrains admin/jetbrains1.main)
  - user: coder
 * With auth agent ssh-agent[/private/tmp/com.apple.launchd.RfWkAqq1se/Listeners]
 * Connected: Socket[addr=null,port=0,localport=0]
 * Starting authentication
  => none (success)
2024-10-25 15:01:00,371 [  45243]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:01:00,475 [  45347]   INFO - #c.i.w.i.i.j.s.JpsGlobalModelSynchronizerImpl - Saving global entities com.intellij.platform.workspace.jps.entities.SdkEntity to files
2024-10-25 15:01:00,476 [  45348]   INFO - #c.i.w.i.i.j.s.JpsGlobalModelSynchronizerImpl - Saving global entities com.intellij.platform.workspace.jps.entities.LibraryEntity to files
2024-10-25 15:01:05,663 [  50535]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:01:10,949 [  55821]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:01:16,241 [  61113]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:01:18,404 [  63276]   INFO - #c.i.w.i.i.j.s.JpsGlobalModelSynchronizerImpl - Saving global entities com.intellij.platform.workspace.jps.entities.SdkEntity to files
2024-10-25 15:01:18,406 [  63278]   INFO - #c.i.w.i.i.j.s.JpsGlobalModelSynchronizerImpl - Saving global entities com.intellij.platform.workspace.jps.entities.LibraryEntity to files
2024-10-25 15:01:21,517 [  66389]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:01:26,819 [  71691]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:01:32,119 [  76991]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:01:37,411 [  82283]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:01:42,694 [  87566]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:01:47,983 [  92855]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:01:53,250 [  98122]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:01:58,536 [ 103408]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:02:03,833 [ 108705]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:02:09,219 [ 114091]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:02:14,504 [ 119376]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:02:19,796 [ 124668]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:02:25,085 [ 129957]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:02:30,347 [ 135219]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:02:35,625 [ 140497]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:02:40,908 [ 145780]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:02:46,191 [ 151063]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:02:51,464 [ 156336]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:02:56,747 [ 161619]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:03:02,022 [ 166894]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:03:07,296 [ 172168]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:03:12,582 [ 177454]   INFO - CoderRemoteConnectionHandle - GO-242.23339.24 status: pid=53982, alive=true
2024-10-25 15:03:14,046 [ 178918]   INFO - #c.j.g.t.ThinClientHandle - Message from thin client d00eae8b-240f-49ff-8466-fd2b2de51b31: com.intellij.remoteDev.thinClientLink.ClientToGtwMessage$ClientAsksLogs@21ef8dbf

One thing I can notice is that the two have different ports in the targetted hostname:

  • coder-jetbrains--jetbrains1.main--localhost:22
  • coder-jetbrains--jetbrains1.main--localhost:0

@f0ssel
Copy link
Contributor Author

f0ssel commented Oct 25, 2024

After some more investigation into the underlying ssh processes going on I've noticed the following:

After opening IDE to workspace:

ps aux | grep -e "--usage-app="
f0ssel           28578   1.2  0.2 409780816  41840   ??  S     3:25PM   0:00.46 /Users/f0ssel/Library/Application Support/coder-gateway/localhost-3000/coder-darwin-arm64 --global-config /Users/f0ssel/Library/Application Support/coder-gateway/localhost-3000/config --url http://localhost:3000 ssh --stdio --disable-autostart --usage-app=jetbrains admin/jetbrains1.main
f0ssel           28502   0.7  0.2 409790336  37376   ??  S     3:25PM   0:00.45 /Users/f0ssel/Library/Application Support/coder-gateway/localhost-3000/coder-darwin-arm64 --global-config /Users/f0ssel/Library/Application Support/coder-gateway/localhost-3000/config --url http://localhost:3000 ssh --stdio --disable-autostart --usage-app=disable admin/jetbrains1.main
f0ssel           28595   0.6  0.2 409788496  40704   ??  S     3:25PM   0:00.21 /Users/f0ssel/Library/Application Support/coder-gateway/localhost-3000/coder-darwin-arm64 --global-config /Users/f0ssel/Library/Application Support/coder-gateway/localhost-3000/config --url http://localhost:3000 ssh --stdio --disable-autostart --usage-app=jetbrains admin/jetbrains1.main
f0ssel           28517   0.4  0.2 409780304  36384   ??  S     3:25PM   0:00.31 /Users/f0ssel/Library/Application Support/coder-gateway/localhost-3000/coder-darwin-arm64 --global-config /Users/f0ssel/Library/Application Support/coder-gateway/localhost-3000/config --url http://localhost:3000 ssh --stdio --disable-autostart --usage-app=disable admin/jetbrains1.main
f0ssel           28695   0.0  0.0 408626896   1408 s001  S+    3:25PM   0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn --exclude-dir=.idea --exclude-dir=.tox -e --usage-app=

Unfortunately these 4 connections do not get closed when I close the IDE window. As long as the Gateway landing window is still open it seems these SSH connections are held open indefinitely. So even if we correctly use the background hostname for 1 of the 2 connections, it seems like Jetbrains Gateway will keep the old IDE SSH connection around for a while (I'm testing to see just how long right now).

@f0ssel
Copy link
Contributor Author

f0ssel commented Oct 25, 2024

It's also possible the issue is on our end and we are failing to close the ssh process properly, maybe something here: https://github.com/coder/coder/blob/main/cli/ssh.go#L318

@johnstcn
Copy link
Member

johnstcn commented Oct 28, 2024

I think it might actually be the IDE keeping the connections open (perhaps intentionally)?
I was able to replicate this sort of eager connection caching behaviour by simply using the "Remote - SSH" connection in combination with coder config-ssh without the JetBrains Coder plugin installed. Even by adding a connection and clicking 'Test Connection', the coder ssh --stdio process stays hanging around indefinitely until you close the IDE. It even stays open if you open a local project.

Strangely enough, it doesn't do this for other SSH connections. I wonder if it's related to the ProxyCommand?

@f0ssel
Copy link
Contributor Author

f0ssel commented Oct 28, 2024

Thanks for looking into that and helping with the search. I've also been building and rebuilding the plugin locally so I can do some AB testing and I've narrowed down both SSH connections with --usage-app=jetbrains being spawned from a single SSHClient in the plugin:

https://github.com/coder/jetbrains-coder/blob/f0ssel/2-jetbrains-ssh-sessions/src/main/kotlin/com/coder/gateway/CoderRemoteConnectionHandle.kt#L254-L262

I'm not sure if we have control over the layer that's making the client and holding open these SSH connections. Something in the SSH config may be able to help us here.

@f0ssel
Copy link
Contributor Author

f0ssel commented Oct 28, 2024

Here's a client log that is showing constant reconnections with the main hostname, not sure if this is exactly related but could be a sign that we possibly aren't closing a connection somewhere?

https://gist.github.com/f0ssel/3f52cc8dd078a4079fc2b4aaf16d161d

@f0ssel
Copy link
Contributor Author

f0ssel commented Oct 29, 2024

Closing in favor of coder/internal#179 now that we better understand the root cause.

@f0ssel f0ssel closed this as completed Oct 29, 2024
f0ssel added a commit that referenced this issue Oct 29, 2024
Part of #15176

I originally kept this the same because I wanted to be conservative
about when we start dropping activity, but this is proving to be a
problem when using `coder ssh` with `--usage-app=disabled`. Because the
workspace agent still counts this as a connection (I think it still
should so it's counted somewhere) but not as a SSH / IDE session. This
leads to background ssh tasks that want to be untracked still continuing
to bump activity when it shouldn't. This makes it so we have to have an
explicit session to bump activity.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug risk Prone to bugs observability Issues related to observability (metrics, dashboards, alerts, opentelemetry)
Projects
None yet
Development

No branches or pull requests

2 participants