Skip to content

Improve agent connection troubleshooting #15423

Open
2 of 8 issues completed
Open
Feature
2 of 8 issues completed
@matifali

Description

@matifali

Problem Description

Coder agents sometimes fail to connect to the Coder server due to a variety of issues, including network restrictions (e.g., DNS issues, firewalls), missing permissions (e.g., CAP_NET_ADMIN), OS or architecture mismatches, and missing tools for downloading the agent binary. Currently, there’s limited guidance in the UI to help users diagnose and resolve these issues effectively, leading to delays in troubleshooting.

Image

For example, failures in the agent bootstrap script can result in non-connecting agents without a clear indication of the root cause. When checking the workspace logs i.e., docker logs <container name or container id> a typical DNS failure log might look like this:

+ trap waitonexit EXIT
+ mktemp -d -t coder.XXXXXX
+ BINARY_DIR=/tmp/coder.1uZgEp
+ BINARY_NAME=coder
+ BINARY_URL=https://coder.example.com:3000/bin/coder-linux-amd64
+ cd /tmp/coder.1uZgEp
+ :
+ status=
+ command -v curl
+ curl -fsSL --compressed http://coder.example.com:3000/bin/coder-linux-amd64 -o coder
curl: (6) Could not resolve host: coder.example.com
+ status=6
+ echo error: failed to download coder agent
+ echo        command returned: 6
+ echo Trying again in 30 seconds...
+ sleep 30
error: failed to download coder agent
command returned: 6
Trying again in 30 seconds..

Desired Solution

Implement enhanced diagnostics and UI hints that provide actionable guidance to users based on the detected issue. By giving users specific suggestions directly in the UI, they can resolve connectivity issues faster and with less frustration. This includes:

  1. Enhanced Error Logging and Diagnostics

    • Log detailed error messages for each failure point, covering:
      • Network/DNS issues, with suggestions to verify DNS configuration or consult network administrators.
      • Download tool availability (e.g., curl or wget), with instructions on how to install the required tool.
      • OS/architecture mismatches with a link to supported environments in the documentation.
  2. UI Hints for Diagnosed Issues1

    • Network/DNS Issue: If a DNS or network error is detected, show a UI message like:
      “It appears there’s a DNS or firewall issue preventing the agent from connecting to the server. Learn more about network configuration.”
    • Download Tool Missing: If the required download tools (curl, wget) are unavailable, suggest a hint:
      “Required download tool not found. Please install either curl or wget.”
    • Unsupported OS/Architecture: If OS or architecture compatibility issues arise, prompt users to check supported platforms:
      “This environment may be unsupported. Review supported OS and architectures.”
    • Download Logs: If the agent doesn't connect, link to docs to show how to fetch agent logs outside of Coder.

Proposed Implementation

  • Backend Logging: Improve diagnostic logging in the agent bootstrap script to provide clearer insights into why each specific failure occurs.
  • UI Updates1: Implement conditional pop-ups or error messages in the Coder UI that guide users based on diagnosed connectivity issues.
  • Documentation Update: Expand documentation with a troubleshooting section that covers all major connectivity blockers, including example configurations.

Footnotes

  1. This may not be possible currently, as we do not have any way to expose these logs to the UI without the agent running. 2

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions