Skip to content

Define: clear path to debugging a broken workspace #1321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ammario opened this issue May 6, 2022 · 10 comments
Closed

Define: clear path to debugging a broken workspace #1321

ammario opened this issue May 6, 2022 · 10 comments
Assignees
Labels
api Area: HTTP API
Milestone

Comments

@ammario
Copy link
Member

ammario commented May 6, 2022

Ok, so I have an old workspace called ding, but I can never SSH into it. I've tried restarting it to no avail.

asciicast

The natural thing to do is disassemble the terraform, find the underlying resource, and then check on the agent. While I can list templates, I have no way of presenting the underlying terraform or configuration values.

image

Lame workaround

The only workaround is asking the Coder admin where the terraform source is and what values they configured the template with. This makes the coder admin a bottleneck, and even they have to use recall instead of information from the Coder product.

@tjcran
Copy link

tjcran commented May 7, 2022

Leaving this one in Community MVP, but removing from switch blocker. This is an issue that causes confusion and hurts UX and resolving this will make it easier to debug and troubleshoot failed provisions, but is also a nice to have for flipping the switch to public in May.

@tjcran tjcran added this to the Community MVP milestone May 7, 2022
@ammario
Copy link
Member Author

ammario commented May 9, 2022

@tjcran have you explored potential solutions to this problem? Maybe there's something simple we can do that makes it a lot better.

Keep in mind this doesn't exist in v1 since once you get the infrastructure working, the configs don't change so it will probably keep working.

@dwahler
Copy link
Contributor

dwahler commented May 19, 2022

Does being able to dump the Terraform template/state from the CLI seem sufficient for this, or is there more to do here?

FWIW, I also recently ran into some issues with my dev environment where this would have been helpful.

@ammario
Copy link
Member Author

ammario commented May 19, 2022

@dwahler State push/pull is a necessary addition to the product, but we have to be careful about leaking secrets.

@dwahler
Copy link
Contributor

dwahler commented May 19, 2022

Good point. Maybe something that's equivalent to terraform state show would be better, since that redacts variables that are marked as sensitive.

@ammario
Copy link
Member Author

ammario commented May 21, 2022

Good point. Maybe something that's equivalent to terraform state show would be better, since that redacts variables that are marked as sensitive.

We do have coder state pull <workspace> which helps. For example:

$ coder state pull ab 2>&1 | jq -r '.resources[] | select(.type=="kubernetes_deployment").instances[0].attributes
.metadata[0].name'

pulls the deployment name for my workspace. This is cumbersome, but perhaps sufficient.

@misskniss
Copy link

Setting an agent connection timeout through terraform was a suggestion by @kylecarbs in grooming today.

@misskniss
Copy link

Wait for the agent to be up since a user cannot do anything anyway until they are connected was a suggestion from @dwahler

@kylecarbs
Copy link
Member

The timeout should be set via our Terraform provider. Here's an example:

resource "coder-agent" "dev" {
  timeout = "1m"
}

Upon SSH, the user could be notified with a message that the agent timed out which isn't perfect, but is better than spinning forever.

@misskniss
Copy link

misskniss commented May 31, 2022

Moving this to an Epic after our grooming discussion. Tickets needed:

Time out on agent connect after a successful build (2 min, 10 min?). This one needs fixed before Community MVP launch.

  • Add CLI command and Show the agent connect failure on the FE for users. Potentially dump all the names of the resources used. Add meta data (key-value mapping related to a resource) instead and an FE ticket to show this information to the user. See ticket Get infrastructure link for each resource #1325
  • Error when build fails - The error property is Exit 1 - make this more useful.

@misskniss misskniss changed the title No clear path to debugging a broken workspace Define: clear path to debugging a broken workspace Jun 21, 2022
@misskniss misskniss added the epic label Jun 21, 2022
@f0ssel f0ssel closed this as completed Jul 28, 2022
@f0ssel f0ssel closed this as not planned Won't fix, can't repro, duplicate, stale Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Area: HTTP API
Projects
None yet
Development

No branches or pull requests

7 participants