|
| 1 | +# Troubleshooting templates |
| 2 | + |
| 3 | +Occasionally, you may run into scenarios where a workspace is created, but the |
| 4 | +agent is either not connected or the |
| 5 | +[startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script) |
| 6 | +has failed or timed out. |
| 7 | + |
| 8 | +## Agent connection issues |
| 9 | + |
| 10 | +If the agent is not connected, it means the agent or |
| 11 | +[init script](https://github.com/coder/coder/tree/main/provisionersdk/scripts) |
| 12 | +has failed on the resource. |
| 13 | + |
| 14 | +```console |
| 15 | +$ coder ssh myworkspace |
| 16 | +⢄⡱ Waiting for connection from [agent]... |
| 17 | +``` |
| 18 | + |
| 19 | +While troubleshooting steps vary by resource, here are some general best |
| 20 | +practices: |
| 21 | + |
| 22 | +- Ensure the resource has `curl` installed (alternatively, `wget` or `busybox`) |
| 23 | +- Ensure the resource can `curl` your Coder |
| 24 | + [access URL](../../admin/configure.md#access-url) |
| 25 | +- Manually connect to the resource and check the agent logs (e.g., |
| 26 | + `kubectl exec`, `docker exec` or AWS console) |
| 27 | + - The Coder agent logs are typically stored in `/tmp/coder-agent.log` |
| 28 | + - The Coder agent startup script logs are typically stored in |
| 29 | + `/tmp/coder-startup-script.log` |
| 30 | + - The Coder agent shutdown script logs are typically stored in |
| 31 | + `/tmp/coder-shutdown-script.log` |
| 32 | +- This can also happen if the websockets are not being forwarded correctly when |
| 33 | + running Coder behind a reverse proxy. |
| 34 | + [Read our reverse-proxy docs](../../admin/configure.md#tls--reverse-proxy) |
| 35 | + |
| 36 | +## Startup script issues |
| 37 | + |
| 38 | +Depending on the contents of the |
| 39 | +[startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script), |
| 40 | +and whether or not the |
| 41 | +[startup script behavior](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script_behavior) |
| 42 | +is set to blocking or non-blocking, you may notice issues related to the startup |
| 43 | +script. In this section we will cover common scenarios and how to resolve them. |
| 44 | + |
| 45 | +### Unable to access workspace, startup script is still running |
| 46 | + |
| 47 | +If you're trying to access your workspace and are unable to because the |
| 48 | +[startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script) |
| 49 | +is still running, it means the |
| 50 | +[startup script behavior](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script_behavior) |
| 51 | +option is set to blocking or you have enabled the `--wait=yes` option (for e.g. |
| 52 | +`coder ssh` or `coder config-ssh`). In such an event, you can always access the |
| 53 | +workspace by using the web terminal, or via SSH using the `--wait=no` option. If |
| 54 | +the startup script is running longer than it should, or never completing, you |
| 55 | +can try to [debug the startup script](#debugging-the-startup-script) to resolve |
| 56 | +the issue. Alternatively, you can try to force the startup script to exit by |
| 57 | +terminating processes started by it or terminating the startup script itself (on |
| 58 | +Linux, `ps` and `kill` are useful tools). |
| 59 | + |
| 60 | +For tips on how to write a startup script that doesn't run forever, see the |
| 61 | +[`startup_script`](#startup_script) section. For more ways to override the |
| 62 | +startup script behavior, see the |
| 63 | +[`startup_script_behavior`](#startup_script_behavior) section. |
| 64 | + |
| 65 | +Template authors can also set the |
| 66 | +[startup script behavior](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script_behavior) |
| 67 | +option to non-blocking, which will allow users to access the workspace while the |
| 68 | +startup script is still running. Note that the workspace must be updated after |
| 69 | +changing this option. |
| 70 | + |
| 71 | +### Your workspace may be incomplete |
| 72 | + |
| 73 | +If you see a warning that your workspace may be incomplete, it means you should |
| 74 | +be aware that programs, files, or settings may be missing from your workspace. |
| 75 | +This can happen if the |
| 76 | +[startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script) |
| 77 | +is still running or has exited with a non-zero status (see |
| 78 | +[startup script error](#startup-script-error)). No action is necessary, but you |
| 79 | +may want to |
| 80 | +[start a new shell session](#session-was-started-before-the-startup-script-finished-web-terminal) |
| 81 | +after it has completed or check the |
| 82 | +[startup script logs](#debugging-the-startup-script) to see if there are any |
| 83 | +issues. |
| 84 | + |
| 85 | +### Session was started before the startup script finished |
| 86 | + |
| 87 | +The web terminal may show this message if it was started before the |
| 88 | +[startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script) |
| 89 | +finished, but the startup script has since finished. This message can safely be |
| 90 | +dismissed, however, be aware that your preferred shell or dotfiles may not yet |
| 91 | +be activated for this shell session. You can either start a new session or |
| 92 | +source your dotfiles manually. Note that starting a new session means that |
| 93 | +commands running in the terminal will be terminated and you may lose unsaved |
| 94 | +work. |
| 95 | + |
| 96 | +Examples for activating your preferred shell or sourcing your dotfiles: |
| 97 | + |
| 98 | +- `exec zsh -l` |
| 99 | +- `source ~/.bashrc` |
| 100 | + |
| 101 | +### Startup script exited with an error |
| 102 | + |
| 103 | +When the |
| 104 | +[startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script) |
| 105 | +exits with an error, it means the last command run by the script failed. When |
| 106 | +`set -e` is used, this means that any failing command will immediately exit the |
| 107 | +script and the remaining commands will not be executed. This also means that |
| 108 | +[your workspace may be incomplete](#your-workspace-may-be-incomplete). If you |
| 109 | +see this error, you can check the |
| 110 | +[startup script logs](#debugging-the-startup-script) to figure out what the |
| 111 | +issue is. |
| 112 | + |
| 113 | +Common causes for startup script errors: |
| 114 | + |
| 115 | +- A missing command or file |
| 116 | +- A command that fails due to missing permissions |
| 117 | +- Network issues (e.g., unable to reach a server) |
| 118 | + |
| 119 | +### Debugging the startup script |
| 120 | + |
| 121 | +The simplest way to debug the |
| 122 | +[startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script) |
| 123 | +is to open the workspace in the Coder dashboard and click "Show startup log" (if |
| 124 | +not already visible). This will show all the output from the script. Another |
| 125 | +option is to view the log file inside the workspace (usually |
| 126 | +`/tmp/coder-startup-script.log`). If the logs don't indicate what's going on or |
| 127 | +going wrong, you can increase verbosity by adding `set -x` to the top of the |
| 128 | +startup script (note that this will show all commands run and may output |
| 129 | +sensitive information). Alternatively, you can add `echo` statements to show |
| 130 | +what's going on. |
| 131 | + |
| 132 | +Here's a short example of an informative startup script: |
| 133 | + |
| 134 | +```shell |
| 135 | +echo "Running startup script..." |
| 136 | +echo "Run: long-running-command" |
| 137 | +/path/to/long-running-command |
| 138 | +status=$? |
| 139 | +echo "Done: long-running-command, exit status: ${status}" |
| 140 | +if [ $status -ne 0 ]; then |
| 141 | + echo "Startup script failed, exiting..." |
| 142 | + exit $status |
| 143 | +fi |
| 144 | +``` |
| 145 | + |
| 146 | +> **Note:** We don't use `set -x` here because we're manually echoing the |
| 147 | +> commands. This protects against sensitive information being shown in the log. |
| 148 | +
|
| 149 | +This script tells us what command is being run and what the exit status is. If |
| 150 | +the exit status is non-zero, it means the command failed and we exit the script. |
| 151 | +Since we are manually checking the exit status here, we don't need `set -e` at |
| 152 | +the top of the script to exit on error. |
| 153 | + |
| 154 | +> **Note:** If you aren't seeing any logs, check that the `dir` directive points |
| 155 | +> to a valid directory in the file system. |
0 commit comments