Skip to content

Return proper exit code on ssh with TTY #3192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 27, 2022
Merged

Conversation

spikecurtis
Copy link
Contributor

Related to #3125 - JetBrains use the exit code of commands they run over SSH for logic, so it is important that we report them correctly. They also request a TTY so they can send interrupt signals.

This PR updates our SSH agent to correctly return exit codes from SSH commands when a TTY is requested. Previously, we ignored errors from the TTY entirely.

It fixes a previously ignored error where both the PTY code and the Agent code would Wait() on the command process, resulting in a race where one waiter would get a SystemCallError instead of the exit status. Now only the PTY waits on the actual command. Users can now Wait() on the PTY to get the results of the inner command running in it.

It also corrects an issue for non-TTY commands where we return exit code 1 any time the command returned non-zero: we now correctly return the actual exit code.

@spikecurtis spikecurtis requested a review from a team July 25, 2022 23:28
@spikecurtis spikecurtis enabled auto-merge (squash) July 26, 2022 17:22
agent/agent.go Outdated
// CoderSessionErrorCode indicates that something went wrong with the session, rather than the
// command just returning a nonzero exit code, and is chosen as an arbitrary, high number
// unlikely to shadow other exit codes, which are typically 1, 2, 3, etc.
CoderSessionErrorCode = 229
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should call this "Magic" instead? It's specific to Coder, but that's somewhat implicit due to the repo name.

agent/agent.go Outdated
if err != nil {
a.logger.Warn(ctx, "ssh session failed", slog.Error(err))
_ = session.Exit(1)
_ = session.Exit(CoderSessionErrorCode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment explaining why exiting with 1 is bad could be helpful for future readers.

pty/pty_other.go Outdated
return p.cmd.Process.Kill()
}

func (p *otherPtyWithProcess) waitInternal() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure I understand why we need to abstract this. It seems both implementations use the process and wait. I feel like we could switch based on OS in the Start func instead, and still return the underlying *os.Process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not safe for multiple goroutines to Wait() on an os.Process. This creates a race where one of them can get a SyscallError complaining that the child process doesn't exist.

We didn't notice this before because we were throwing away the errors in a TTY, but we do actually need them to return the correct exit code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see. What do you think about returning an abstract Process in that case instead of with the TTY? Seems like they are separate concerns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can separate them out. The underlying impl will need a reference to the pty in the Linux case, so that we can call runtime.KeepAlive, but that will be hidden from the caller.

Signed-off-by: Spike Curtis <spike@coder.com>
@spikecurtis spikecurtis force-pushed the spike/3125_ssh_tty_exit_code branch from b4ba237 to b60d811 Compare July 27, 2022 17:35
Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
@@ -29,6 +29,16 @@ type PTY interface {
Resize(height uint16, width uint16) error
}

// Process represents a process running in a PTY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be helpful to explain why this abstraction is necessary.

@spikecurtis spikecurtis merged commit 36ffdce into main Jul 27, 2022
@spikecurtis spikecurtis deleted the spike/3125_ssh_tty_exit_code branch July 27, 2022 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants