Skip to content

feat: Support keep-alive messages #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jun 15, 2023
Merged

Conversation

mtojek
Copy link
Member

@mtojek mtojek commented Jun 7, 2023

Related: coder/coder#7581

This PR enables SSH server support for keep-alive messages. It introduces 2 more configuration options: ClientAliveInterval and ClientAliveCountMax.

This library is relatively old and not really covered with tests. I can add some if you find them beneficial.

How to test it (as CI doesn't start any tests):

Tab 1:

go run _examples/ssh-keepalive/keepalive.go

Tab 2:

ssh -o HostKeyAlgorithms=+ssh-rsa -vvv localhost -p 2222

@mtojek mtojek self-assigned this Jun 7, 2023
@mtojek mtojek marked this pull request as ready for review June 7, 2023 09:47
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add a test for this case? Start up a server with a low timeout, validate that

  1. Without keepalives, connection times out
  2. With keepalives, connection stays up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to cover 2. with session stats, so we can ensure what was called and how many times.

Unfortunately, I can't address 1. easily as intercepting SSH connection on request level is not possible. Long story short, I don't know how to deny the SSH client from responding to keep-alive requests.

Copy link
Member

@code-asher code-asher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried out both ClientAliveInterval and ServerAliveInterval and they are working wonderfully.

One weird thing I noticed is that openssh's server actually seems to reply with SSH_MSG_REQUEST_FAILURE (82) while we are now replying with SSH_MSG_REQUEST_SUCCESS (81).

Similarly, the openssh client replies with SSH_MSG_CHANNEL_FAILURE (100).

Seems like odd behavior on openssh's part and the implementation here is better but I thought I would mention it in case there is some reason it is supposed to reply with failures (or maybe I am just doing something wrong).

@mtojek
Copy link
Member Author

mtojek commented Jun 12, 2023

Seems like odd behavior on openssh's part and the implementation here is better but I thought I would mention it in case there is some reason it is supposed to reply with failures (or maybe I am just doing something wrong).

I had a similar observation. I tried with some other existing SSH servers, and my Mac SSH client behaves the same way - it replies with a failure.

TL;DR I added a comment paragraph describing the anomaly.

Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it out and it appears to work fine.
Found a couple of data races though.


t.Run("Server terminates connection due to no keep-alive replies", func(t *testing.T) {
t.Parallel()
t.Skip("Go SSH client doesn't support disabling replies to keep-alive requests. We can't test it easily without mocking logic.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😢

@mtojek
Copy link
Member Author

mtojek commented Jun 14, 2023

This is a great finding, Cian. I forgot to run this with race flag and discovered a can of worms. It looks like using ctx.KeepAliveCallback() always causes a race condition with a session loop. It is just a bad design.

I'm afraid that I have to start it over 💀

Read at 0x00c00038ccc0 by goroutine 95:
  github.com/gliderlabs/ssh.(*sshContext).KeepAliveCallback()
      /Users/mtojek/code/ssh/context.go:162 +0x44
  github.com/gliderlabs/ssh.KeepAliveRequestHandler()
      /Users/mtojek/code/ssh/session.go:530 +0x4c
  github.com/gliderlabs/ssh.(*Server).handleRequests()
      /Users/mtojek/code/ssh/server.go:335 +0x104
  github.com/gliderlabs/ssh.(*Server).HandleConn.func3()
      /Users/mtojek/code/ssh/server.go:309 +0x58

Previous write at 0x00c00038ccc0 by goroutine 100:
  github.com/gliderlabs/ssh.(*sshContext).SetValue()
      /Users/mtojek/code/ssh/context.go:127 +0x70
  github.com/gliderlabs/ssh.(*session).handleRequests()
      /Users/mtojek/code/ssh/session.go:311 +0x330
  github.com/gliderlabs/ssh.DefaultSessionHandler()
      /Users/mtojek/code/ssh/session.go:129 +0x36c
  github.com/gliderlabs/ssh.(*Server).HandleConn.func4()
      /Users/mtojek/code/ssh/server.go:319 +0x84

@mtojek
Copy link
Member Author

mtojek commented Jun 15, 2023

Ok, I refactored the code by extracting the KeepAlive part to a separate entity guarded with a mutex.

@mtojek mtojek requested a review from johnstcn June 15, 2023 10:01
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all implementation logic seems solid, but some minor things came up.

I'm mainly concerned about us diverging from OpenSSH with the message types (see comment), why don't we mimic the behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants