Skip to content

Improve template/workspace build diagnostic #15447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
matifali opened this issue Nov 8, 2024 · 4 comments
Open

Improve template/workspace build diagnostic #15447

matifali opened this issue Nov 8, 2024 · 4 comments
Labels
need-backend Issues that need backend work onboarding-improvement site Area: frontend dashboard

Comments

@matifali
Copy link
Member

matifali commented Nov 8, 2024

Problem Description

Template creation in Coder can fail due to various setup and infrastructure requirements, impacting user experience by not providing clear diagnostics to troubleshoot. This is especially critical for new deployments evaluating Coder.

Templates can be created via:

  1. CLI: coder templates push.
  2. UI: Uploading an archive or selecting built-in templates.

During provisioning, Coder runs terraform plan and terraform apply. Failures may occur due to unmet dependencies, incorrect configurations, or provider authentication issues, but diagnostics are limited, often only showing a generic terraform exit code 1 without specifics. For instance, the Docker template requires:

  • Access to the Docker socket.
  • Coder user permissions to access the socket.

Failures vary across setups, such as Docker Compose, Kubernetes, and system services, making troubleshooting challenging. Missing diagnostics for Docker socket, permissions, or provider authentication errors contribute to user frustration and delays in setup.

Desired Solution

To improve diagnostics, we need actionable logs, UI hints, and expanded documentation for common setup issues.

  1. Enhanced Logging and Diagnostics:

    • Capture specific error points in terraform plan and terraform apply, especially around Docker socket access and provider authentication.
    • Log provider-specific authentication errors (e.g., AWS credentials) and expose them in both logs and UI.
  2. UI Feedback and Suggested Fixes:

    • Display Terraform error details in the UI, with actionable guidance based on the issue type:
      • Docker Socket Unavailability: Suggest relevant fixes depending on the environment (Docker Compose, Kubernetes, etc.), such as adding add_groups or setting up Docker group permissions.
      • Provider Authentication Errors: Show hints for authentication failures, like AWS or GCP, and provide links to troubleshooting documentation.
  3. Documentation Update:

    • Add troubleshooting guides specific to template creation failures, covering Docker socket access, permissions setup, and provider authentication, with examples for Docker Compose, Kubernetes, and system services.

Implementing these steps will streamline template creation, enabling users to resolve failures independently and improving the initial user experience with Coder.

Related issues:

@coder-labeler coder-labeler bot added docs Area: coder.com/docs needs-triage Issue that require triage labels Nov 8, 2024
@matifali matifali added site Area: frontend dashboard need-backend Issues that need backend work and removed needs-triage Issue that require triage labels Nov 8, 2024
@johnstcn
Copy link
Member

The scope of this issue is fairly wide. I think it might need to be broken down a bit further.

Capture specific error points in terraform plan and terraform apply, especially around Docker socket access and provider authentication.

Which specific error points?

Log provider-specific authentication errors (e.g., AWS credentials) and expose them in both logs and UI.

We'll need to add code for each specific provider. How about we focus on the top 3: Docker, Kubernetes, AWS?

Add troubleshooting guides specific to template creation failures, covering Docker socket access, permissions setup, and provider authentication, with examples for Docker Compose, Kubernetes, and system services.

Are these not covered by our existing documentation?

@matifali
Copy link
Member Author

Are these not covered by our existing documentation?

They are but could be improved with explicit instructions (specifically regarding the docker socket permission issues in the context of Coder), also we need to link them from within the dashboard to take the user to actionable troubleshooting steps. I would prefer error codes and docs similar to the ones we have for the heathstatus page.

We'll need to add code for each specific provider. How about we focus on the top 3: Docker, Kubernetes, AWS?

Yes, We can start with these 3. It would be nice to have if we could run some preflight checks and suggest or auto-import the template the Coder server can connect.
e.g, If the docker socket is available and the Coder user has access to it, suggest using the docker template.

Which specific error points?

More verbose template build errors in the web template editor. They could relate to anything from wrong syntax to provider authentication issues.

@matifali matifali changed the title Improve template build diagnostic Improve template/workspace build diagnostic Jan 6, 2025
@EdwardAngert EdwardAngert added docs Area: coder.com/docs and removed docs Area: coder.com/docs labels Feb 7, 2025
@tomelliff
Copy link

Just got linked to this issue via support and this would be super helpful for users to be able to debug some of their own issues.

We're using Kubernetes and a workspace build with the wrong image tag (so it can't pull the image) or requesting more resources than can be scheduled leads to a completely opaque error. I've added some internal docs for how people should debug this (describe the deployment to see the events and then map those events error strings to what is actually wrong and how the fix it) but it's hard to make those docs discoverable and for people to know what to do.

If instead Coder could return those errors (either the raw error on the Kubernetes events for the deployment, or better yet something less raw and more user facing) then that would help a ton compared to just a time out waiting for the deployment to complete.

@EdwardAngert EdwardAngert removed the docs Area: coder.com/docs label Apr 9, 2025
@EdwardAngert
Copy link
Contributor

removing from docs for now - once we have it implemented or part of a release, we'll be able to document it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-backend Issues that need backend work onboarding-improvement site Area: frontend dashboard
Projects
None yet
Development

No branches or pull requests

4 participants