Skip to content

fix: remove pty match for TestSSH/RemoteForward #10789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 20, 2023

Conversation

spikecurtis
Copy link
Contributor

Fixes #10578

Copy link
Contributor Author

Current dependencies on/for this PR:

This stack of pull requests is managed by Graphite.

@spikecurtis spikecurtis force-pushed the spike/10578-flake-ssh-remote-forward branch from 043bda0 to 0930d7a Compare November 20, 2023 13:33
// true exits the loop.
return true
}
resp, err := http.DefaultClient.Do(req)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this may leave idle connections after it (triggering leak detection). If it does though, perhaps we just need to handle it in test main so we can keep using the default client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen any. I think idle connections are cleaned up promptly.

assert.NoError(t, err)
assert.EqualValues(t, "hello world", body)
return true
}, testutil.WaitLong, testutil.IntervalFast)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous matcher waited for i-am-ready echoed by the startup script. It means that the agent didn't start yet, and localhost:8222 is not available. I'm curious where is the trick in the new implementation. Is it a matter of testutil.WaitLong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, before we waited for i-am-ready and then connected.

Here, we just (re)try connecting in a loop until success or a timeout.

Copy link
Member

@mtojek mtojek Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it the same root cause? The agent is still not ready to connect. Are you just trying to reduce complexity and remove extra conditions?

EDIT:

I don't argue with the new implementation. I'm trying to understand the root cause. Would extending the timeout for pty "expect match" work as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we used PTY/expect to wait until we thought the agent was connected and SSH remote forward was up, then we connected and verified the output. The root cause of the flake is that we didn't always get the startup script log because of the way the ssh command works. It detects that a startup script is running, then asks to follow logs, but the log may have already been dispatched and we miss it, causing the expect to hang.

In the new implementation we do away with PTY/expect entirely, and thus dodge the root cause from before. To deal with the fact that we no longer get an explicit signal that the remote forward is up, we just retry it multiple times until it comes up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root cause of the flake is that we didn't always get the startup script log because of the way the ssh command works.

Ok, that's the part I wasn't aware of. Thanks!

@spikecurtis spikecurtis merged commit 92ef0ba into main Nov 20, 2023
@spikecurtis spikecurtis deleted the spike/10578-flake-ssh-remote-forward branch November 20, 2023 16:50
Copy link
Contributor Author

Merge activity

@github-actions github-actions bot locked and limited conversation to collaborators Nov 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flake: TestSSH/RemoteForward: match deadline exceeded
3 participants