Skip to content

feat(cli): Add support for delay_login_until_ready #5851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jan 27, 2023

Conversation

mafredri
Copy link
Member

@mafredri mafredri commented Jan 25, 2023

This PR adds support to ssh and coder ssh for delay_login_until_ready.

Ref: #5749

Here are examples of when coder_agent.delay_login_until_ready = true (for false, behavior is as before, sans formatting changes):

❯ coder ssh test

 > The workspace agent is taking longer than expected to
   start. See troubleshooting instructions at:
   https://coder.com/docs/coder-oss/latest/templates#troubleshooting-templates

⢎⡱ Waiting for  main  to finish starting up...

On 30+s start, start_timeout => start_error:

❯ coder ssh test

 > Don't panic, your workspace is starting up!

 > The workspace agent is taking longer than expected to
   start. See troubleshooting instructions at:
   https://coder.com/docs/coder-oss/latest/templates#troubleshooting-templates

 > The workspace agent ran into a problem during startup.
   See troubleshooting instructions at:
   https://coder.com/docs/coder-oss/latest/templates#troubleshooting-templates

Agent startup script exited with non-zero status, use --no-wait to login anyway.
Run 'coder ssh --help' for usage.

Ultimately in this case, the user can log in with coder ssh --skip-delay-login-until-ready.

We might want to change behavior of start_error to exit with non-zero status instead blocking.

cli/ssh.go Outdated
@@ -242,6 +244,7 @@ func ssh() *cobra.Command {
cliflag.BoolVarP(cmd.Flags(), &forwardGPG, "forward-gpg", "G", "CODER_SSH_FORWARD_GPG", false, "Specifies whether to forward the GPG agent. Unsupported on Windows workspaces, but supports all clients. Requires gnupg (gpg, gpgconf) on both the client and workspace. The GPG agent must already be running locally and will not be started for you. If a GPG agent is already running in the workspace, it will be attempted to be killed.")
cliflag.StringVarP(cmd.Flags(), &identityAgent, "identity-agent", "", "CODER_SSH_IDENTITY_AGENT", "", "Specifies which identity agent to use (overrides $SSH_AUTH_SOCK), forward agent must also be enabled")
cliflag.DurationVarP(cmd.Flags(), &wsPollInterval, "workspace-poll-interval", "", "CODER_WORKSPACE_POLL_INTERVAL", workspacePollInterval, "Specifies how often to poll for workspace automated shutdown.")
cliflag.BoolVarP(cmd.Flags(), &skipDelayLoginUntilReady, "skip-delay-login-until-ready", "", "CODER_SSH_SKIP_DELAY_LOGIN_UNTIL_READY", false, "Specifies whether to login to a workspace that has not finished starting up (only applicable when the delay login until ready option is enabled).")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we go with something simpler, shorter here? Maybe --no-delay-login, --no-wait. I'd like to leave room for the opposite flag as well --[no-]wait to allow waiting even when the template doesn't specify it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--no-wait is succinct 👍

}
case codersdk.WorkspaceAgentLifecycleReady:
return nil
default:
Copy link
Member Author

@mafredri mafredri Jan 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay or nay?

Suggested change
default:
case codersdk.WorkspaceAgentLifecycleStartError:
showMessage()
return errors.New("...")
default:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should fail-open here. Let folks try to connect, but warn that it might not work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I agree. Let's make it so!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, this is what it looks like now:

❯ coder ssh test

 > The workspace agent ran into a problem during startup.
   See troubleshooting instructions at:
   https://coder.com/docs/coder-oss/latest/templates#troubleshooting-templates

Agent startup script exited with non-zero status, use --no-wait to login anyway.
Run 'coder ssh --help' for usage.

And with --no-wait we still warn about the problem:

❯ coder ssh test --no-wait

 > The workspace agent ran into a problem during startup.
   See troubleshooting instructions at:
   https://coder.com/docs/coder-oss/latest/templates#troubleshooting-templates

mafredri@test ~
❯


func waitingMessage(agent codersdk.WorkspaceAgent) (m *message) {
m = &message{
Prompt: "Don't panic, your workspace is booting up!",
Copy link
Member

@johnstcn johnstcn Jan 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add directions on how to bypass this e.g.

To skip this check, use the `--no-wait` argument

(Or whatever we end up calling it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this message (this state is before agent is connected so we can't connect yet).

We could add it for the other message, but I wonder if it would encourage bypassing the prompt, esp. without understanding what the flag does. You just go Ctrl+C -> --no-wait yay I'm in (and not everything is working). Would it suffice to explain this in the troubleshooting link?

I think this need will also be alleviated by streaming the startup log once we have it available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it suffice to explain this in the troubleshooting link?

That's fine either

Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is definitely a good step in troubleshooting failed workspace builds.

One possible avenue of future work I see is adding WorkspaceAgentDownloading / WorkspaceAgentDownloaded states. We could add a random pre-shared nonce (even the build ID could work here) to the workspace agent downloaded URL and use that to determine if the workspace attempted to download the agent.

@mafredri
Copy link
Member Author

Made a recording to see this in action: asciicast

@mafredri
Copy link
Member Author

Updated messaging:

⠈⠱ Waiting for  main  to become ready...

 > Don't panic, your workspace agent has connected and
   the workspace is getting ready!

 > The workspace is taking longer than expected to get
   ready, the agent startup script is still executing.
   See troubleshooting instructions at:
   https://coder.com/docs/coder-oss/latest/templates#troubleshooting-templates

 > The workspace ran into a problem while getting ready,
   the agent startup script exited with non-zero status.
   See troubleshooting instructions at:
   https://coder.com/docs/coder-oss/latest/templates#troubleshooting-templates

Agent startup script exited with non-zero status, use --no-wait to login anyway.
Run 'coder ssh --help' for usage.
❯ coder ssh test --no-wait

 > Your workspace is still getting ready, it may be in an
   incomplete state.

mafredri@test ~
❯

m.Prompt = "The workspace is taking longer than expected to get ready, the agent startup script is still executing."
case codersdk.WorkspaceAgentLifecycleStartError:
m.Spin = ""
m.Prompt = "The workspace ran into a problem while getting ready, the agent startup script exited with non-zero status."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this copy! Once we start streaming build logs, I imagine we can also link to view the logs + the troubleshooting URL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes definitely! We can also stream the log in the terminal while the user is waiting, like 5 latest rows (keeps updating). Maybe that'd enabled/disabled via flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's even better!

@mafredri mafredri merged commit a753703 into main Jan 27, 2023
@mafredri mafredri deleted the mafredri/feat-delay-login-until-ready branch January 27, 2023 17:05
@github-actions github-actions bot locked and limited conversation to collaborators Jan 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants