-
Notifications
You must be signed in to change notification settings - Fork 881
Return proper exit code on ssh with TTY #3192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
agent/agent.go
Outdated
// CoderSessionErrorCode indicates that something went wrong with the session, rather than the | ||
// command just returning a nonzero exit code, and is chosen as an arbitrary, high number | ||
// unlikely to shadow other exit codes, which are typically 1, 2, 3, etc. | ||
CoderSessionErrorCode = 229 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should call this "Magic" instead? It's specific to Coder, but that's somewhat implicit due to the repo name.
agent/agent.go
Outdated
if err != nil { | ||
a.logger.Warn(ctx, "ssh session failed", slog.Error(err)) | ||
_ = session.Exit(1) | ||
_ = session.Exit(CoderSessionErrorCode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment explaining why exiting with 1 is bad could be helpful for future readers.
pty/pty_other.go
Outdated
return p.cmd.Process.Kill() | ||
} | ||
|
||
func (p *otherPtyWithProcess) waitInternal() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure I understand why we need to abstract this. It seems both implementations use the process
and wait. I feel like we could switch based on OS in the Start
func instead, and still return the underlying *os.Process
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not safe for multiple goroutines to Wait() on an os.Process
. This creates a race where one of them can get a SyscallError complaining that the child process doesn't exist.
We didn't notice this before because we were throwing away the errors in a TTY, but we do actually need them to return the correct exit code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I see. What do you think about returning an abstract Process in that case instead of with the TTY? Seems like they are separate concerns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can separate them out. The underlying impl will need a reference to the pty in the Linux case, so that we can call runtime.KeepAlive
, but that will be hidden from the caller.
Signed-off-by: Spike Curtis <spike@coder.com>
b4ba237
to
b60d811
Compare
Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
@@ -29,6 +29,16 @@ type PTY interface { | |||
Resize(height uint16, width uint16) error | |||
} | |||
|
|||
// Process represents a process running in a PTY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be helpful to explain why this abstraction is necessary.
Related to #3125 - JetBrains use the exit code of commands they run over SSH for logic, so it is important that we report them correctly. They also request a TTY so they can send interrupt signals.
This PR updates our SSH agent to correctly return exit codes from SSH commands when a TTY is requested. Previously, we ignored errors from the TTY entirely.
It fixes a previously ignored error where both the PTY code and the Agent code would
Wait()
on the command process, resulting in a race where one waiter would get a SystemCallError instead of the exit status. Now only the PTY waits on the actual command. Users can now Wait() on the PTY to get the results of the inner command running in it.It also corrects an issue for non-TTY commands where we return exit code 1 any time the command returned non-zero: we now correctly return the actual exit code.