Skip to content

fix: remove pty match for TestSSH/RemoteForward #10789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 20, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 22 additions & 16 deletions cli/ssh_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -636,6 +636,8 @@ func TestSSH(t *testing.T) {
defer httpServer.Close()

client, workspace, agentToken := setupWorkspaceForAgent(t, nil)
_ = agenttest.New(t, client.URL, agentToken)
coderdtest.AwaitWorkspaceAgents(t, client, workspace.ID)

inv, root := clitest.New(t,
"ssh",
Expand All @@ -644,32 +646,36 @@ func TestSSH(t *testing.T) {
"8222:"+httpServer.Listener.Addr().String(),
)
clitest.SetupConfig(t, client, root)
pty := ptytest.New(t).Attach(inv)
inv.Stderr = pty.Output()

ctx, cancel := context.WithTimeout(context.Background(), testutil.WaitLong)
defer cancel()

cmdDone := tGo(t, func() {
err := inv.WithContext(ctx).Run()
assert.NoError(t, err, "ssh command failed")
// fails because we cancel context to close
assert.Error(t, err, "ssh command should fail")
})

// Agent is still starting
pty.ExpectMatch("Waiting")

_ = agenttest.New(t, client.URL, agentToken)
coderdtest.AwaitWorkspaceAgents(t, client, workspace.ID)

// Startup script has just finished
pty.ExpectMatch(startupScriptPattern)

// Download the test page
pty.WriteLine("curl localhost:8222")
pty.ExpectMatch("hello world")
require.Eventually(t, func() bool {
req, err := http.NewRequestWithContext(ctx, http.MethodGet, "http://localhost:8222/", nil)
if !assert.NoError(t, err) {
// true exits the loop.
return true
}
resp, err := http.DefaultClient.Do(req)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this may leave idle connections after it (triggering leak detection). If it does though, perhaps we just need to handle it in test main so we can keep using the default client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen any. I think idle connections are cleaned up promptly.

if err != nil {
t.Logf("HTTP GET http://localhost:8222/ %s", err)
return false
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
assert.NoError(t, err)
assert.EqualValues(t, "hello world", body)
return true
}, testutil.WaitLong, testutil.IntervalFast)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous matcher waited for i-am-ready echoed by the startup script. It means that the agent didn't start yet, and localhost:8222 is not available. I'm curious where is the trick in the new implementation. Is it a matter of testutil.WaitLong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, before we waited for i-am-ready and then connected.

Here, we just (re)try connecting in a loop until success or a timeout.

Copy link
Member

@mtojek mtojek Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it the same root cause? The agent is still not ready to connect. Are you just trying to reduce complexity and remove extra conditions?

EDIT:

I don't argue with the new implementation. I'm trying to understand the root cause. Would extending the timeout for pty "expect match" work as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we used PTY/expect to wait until we thought the agent was connected and SSH remote forward was up, then we connected and verified the output. The root cause of the flake is that we didn't always get the startup script log because of the way the ssh command works. It detects that a startup script is running, then asks to follow logs, but the log may have already been dispatched and we miss it, causing the expect to hang.

In the new implementation we do away with PTY/expect entirely, and thus dodge the root cause from before. To deal with the fact that we no longer get an explicit signal that the remote forward is up, we just retry it multiple times until it comes up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root cause of the flake is that we didn't always get the startup script log because of the way the ssh command works.

Ok, that's the part I wasn't aware of. Thanks!


// And we're done.
pty.WriteLine("exit")
cancel()
<-cmdDone
})

Expand Down