Skip to content

Commit a77b48a

Browse files
mafredrijohnstcn
andauthored
docs(templates): document startup_script_behavior in-depth (#7857)
Fixes #7759 --------- Co-authored-by: Cian Johnston <cian@coder.com>
1 parent 4f29f9a commit a77b48a

File tree

1 file changed

+96
-28
lines changed

1 file changed

+96
-28
lines changed

docs/templates/index.md

+96-28
Original file line numberDiff line numberDiff line change
@@ -145,12 +145,22 @@ by all child processes of the agent, including SSH sessions. See the
145145
[Coder Terraform Provider documentation](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent)
146146
for the full list of supported arguments for the `coder_agent`.
147147

148-
#### startup_script
148+
#### `startup_script`
149149

150150
Use the Coder agent's `startup_script` to run additional commands like
151151
installing IDEs, [cloning dotfiles](../dotfiles.md#templates), and cloning
152152
project repos.
153153

154+
**Note:** By default, the startup script is executed in the background.
155+
This allows users to access the workspace before the script completes.
156+
If you want to change this, see [`startup_script_behavior`](#startup_script_behavior) below.
157+
158+
Here are a few guidelines for writing a good startup script (more on these below):
159+
160+
1. Use `set -e` to exit the script if any command fails and `|| true` for commands that are allowed to fail
161+
2. Use `&` to start a process in the background, allowing the startup script to complete
162+
3. Inform the user about what's going on via `echo`
163+
154164
```hcl
155165
resource "coder_agent" "coder" {
156166
os = "linux"
@@ -163,27 +173,53 @@ resource "coder_agent" "coder" {
163173
# that does not require root permissions. Note that /tmp may be mounted in tmpfs which
164174
# can lead to increased RAM usage. To avoid this, you can pre-install code-server inside
165175
# the Docker image or VM image.
176+
echo "Installing code-server..."
166177
curl -fsSL https://code-server.dev/install.sh | sh -s -- --method=standalone --prefix=/tmp/code-server --version 4.8.3
167178
168179
# The & prevents the startup_script from blocking so the next commands can run.
169180
# The stdout and stderr of code-server is redirected to /tmp/code-server.log.
181+
echo "Starting code-server..."
170182
/tmp/code-server/bin/code-server --auth none --port 13337 >/tmp/code-server.log 2>&1 &
171183
172-
# var.repo and var.dotfiles_uri is specified
173-
# elsewhere in the Terraform code as input
174-
# variables.
184+
# Notice: var.repo and var.dotfiles_uri are specified elsewhere in the Terraform
185+
# code as input variables.
186+
REPO=${var.repo}
187+
DOTFILES_URI=${var.dotfiles_uri}
175188
176189
# clone repo
177190
ssh-keyscan -t rsa github.com >> ~/.ssh/known_hosts
178-
git clone --progress git@github.com:${var.repo}
191+
echo "Cloning $REPO..."
192+
git clone --progress git@github.com:"$REPO"
179193
180194
# use coder CLI to clone and install dotfiles
181-
coder dotfiles -y ${var.dotfiles_uri}
182-
195+
echo "Cloning dotfiles..."
196+
coder dotfiles -y "$DOTFILES_URI"
183197
EOT
184198
}
185199
```
186200

201+
The startup script can contain important steps that must be executed successfully so that the workspace is in a usable state, for this reason we recommend using `set -e` (exit on error) at the top and `|| true` (allow command to fail) to ensure the user is notified when something goes wrong. These are not shown in the example above because, while useful, they need to be used with care. For more assurance, you can utilize [shellcheck](https://www.shellcheck.net) to find bugs in the script and employ [`set -euo pipefail`](https://wizardzines.com/comics/bash-errors/) to exit on error, unset variables, and fail on pipe errors.
202+
203+
We also recommend that startup scripts do not run forever. Long-running processes, like code-server, should be run in the background. This is usually achieved by adding `&` to the end of the command. For example, `sleep 10 &` will run the command in the background and allow the startup script to complete.
204+
205+
> **Note:** If a backgrounded command (`&`) writes to stdout or stderr, the startup script will not complete until the command completes or closes the file descriptors. To avoid this, you can redirect the stdout and stderr to a file. For example, `sleep 10 >/dev/null 2>&1 &` will redirect the stdout and stderr to `/dev/null` (discard) and run the command in the background.
206+
207+
PS. Notice how each step starts with `echo "..."` to provide feedback to the user about what is happening? This is especially useful when the startup script behavior is set to blocking because the user will be informed about why they're waiting to access their workspace.
208+
209+
#### `startup_script_behavior`
210+
211+
Use the Coder agent's `startup_script_behavior` to change the behavior between `blocking` and `non-blocking` (default). The blocking behavior is recommended for most use cases because it allows the startup script to complete before the user accesses the workspace. For example, let's say you want to check out a very large repo in the startup script. If the startup script is non-blocking, the user may log in via SSH or open the IDE before the repo is fully checked out. This can lead to a poor user experience.
212+
213+
Whichever behavior is enabled, the user can still choose to override it by specifying the appropriate flags (or environment variables) in the CLI when connecting to the workspace. For example, `coder ssh --no-wait` will connect to the workspace without waiting for the startup script to complete.
214+
215+
```hcl
216+
resource "coder_agent" "coder" {
217+
os = "linux"
218+
arch = "amd64"
219+
startup_script_behavior = "blocking"
220+
startup_script = "echo 'Starting...'"
221+
```
222+
187223
### Start/stop
188224

189225
[Learn about resource persistence in Coder](./resource-persistence.md)
@@ -372,37 +408,69 @@ practices:
372408
- The Coder agent shutdown script logs are typically stored in `/tmp/coder-shutdown-script.log`
373409
- This can also happen if the websockets are not being forwarded correctly when running Coder behind a reverse proxy. [Read our reverse-proxy docs](https://coder.com/docs/v2/latest/admin/configure#tls--reverse-proxy)
374410

375-
### Agent does not become ready
411+
### Startup script issues
376412

377-
If the agent does not become ready, it means the [startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script) is still running or has exited with a non-zero status. This also means the [login before ready](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#login_before_ready) option hasn't been set to true.
413+
Depending on the contents of the [startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script), and whether or not the [startup script behavior](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script_behavior) is set to blocking or non-blocking, you may notice issues related to the startup script. In this section we will cover common scenarios and how to resolve them.
378414

379-
```console
380-
$ coder ssh myworkspace
381-
⢄⡱ Waiting for [agent] to become ready...
382-
```
415+
#### Unable to access workspace, startup script is still running
383416

384-
To troubleshoot readiness issues, check the agent logs as suggested above. You can connect to the workspace using `coder ssh` with the `--no-wait` flag. Please note that while this makes login possible, the workspace may be in an incomplete state.
417+
If you're trying to access your workspace and are unable to because the [startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script) is still running, it means the [startup script behavior](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script_behavior) option is set to blocking or you have enabled the `--wait` option (for e.g. `coder ssh` or `coder config-ssh`). In such an event, you can always access the workspace by using the web terminal, or via SSH using the `--no-wait` option. If the startup script is running longer than it should, or never completing, you can try to [debug the startup script](#debugging-the-startup-script) to resolve the issue. Alternatively, you can try to force the startup script to exit by terminating processes started by it or terminating the startup script itself (on Linux, `ps` and `kill` are useful tools).
385418

386-
```console
387-
$ coder ssh myworkspace --no-wait
419+
For tips on how to write a startup script that doesn't run forever, see the [`startup_script`](#startup_script) section.
388420

389-
> The workspace is taking longer than expected to get
390-
ready, the agent startup script is still executing.
391-
See troubleshooting instructions at: [...]
421+
Template authors can also set the [startup script behavior](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script_behavior) option to non-blocking, which will allow users to access the workspace while the startup script is still running. Note that the workspace must be updated after changing this option.
392422

393-
user@myworkspace $
394-
```
423+
Useful commands:
395424

396-
If the startup script is expected to take a long time, you can try raising the timeout defined in the template:
425+
- `coder ssh --wait my-workspace` or `export CODER_SSH_WAIT=true; ssh coder.my-workspace`
426+
- `coder ssh --no-wait my-workspace` or `export CODER_SSH_NO_WAIT=true; ssh coder.my-workspace`
427+
- `coder config-ssh --wait` or `coder config-ssh --no-wait`
397428

398-
```tf
399-
resource "coder_agent" "main" {
400-
# ...
401-
login_before_ready = false
402-
startup_script_timeout = 1800 # 30 minutes in seconds.
403-
}
429+
#### Your workspace may be incomplete
430+
431+
If you see a warning that your workspace may be incomplete, it means you should be aware that programs, files, or settings may be missing from your workspace. This can happen if the [startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script) is still running or has exited with a non-zero status (see [startup script error](#startup-script-error)). No action is necessary, but you may want to check the [startup script logs](#debugging-the-startup-script) to see if there are any issues.
432+
433+
#### Session was started before the startup script finished (web terminal)
434+
435+
They web terminal may show this message if it was started before the [startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script) finished, but the startup script has since finished. This message can safely be dismissed, however, be aware that your preferred shell or dotfiles may not yet be activated for this shell session. You can either start a new session or source your dotfiles manually. Note that starting a new session means that commands running in the terminal will be terminated and you may lose unsaved work.
436+
437+
Examples for activating your preferred shell or sourcing your dotfiles:
438+
439+
- `exec zsh -l`
440+
- `source ~/.bashrc`
441+
442+
#### Startup script exited with an error
443+
444+
When the [startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script) exits with an error, it means the last command run by the script failed. When `set -e` is used, this means that any failing command will immediately exit the script and the remaining commands will not be executed. This also means that [your workspace may be incomplete](#your-workspace-may-be-incomplete). If you see this error, you can check the [startup script logs](#debugging-the-startup-script) to figure out what the issue is.
445+
446+
Common causes for startup script errors:
447+
448+
- A missing command or file
449+
- A command that fails due to missing permissions
450+
- Network issues (e.g., unable to reach a server)
451+
452+
#### Debugging the startup script
453+
454+
The simplest way to debug the [startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script) is to open the workspace in the Coder dashboard and click "Show startup log" (if not already visible). This will show all the output from the script. Another option is to view the log file inside the workspace (usually `/tmp/coder-startup-script.log`). If the logs don't indicate what's going on or going wrong, you can increase verbosity by adding `set -x` to the top of the startup script (note that this will show all commands run and may output sensitive information). Alternatively, you can add `echo` statements to show what's going on.
455+
456+
Here's a short example of an informative startup script:
457+
458+
```sh
459+
echo "Running startup script..."
460+
echo "Run: long-running-command"
461+
/path/to/long-running-command
462+
status=$?
463+
echo "Done: long-running-command, exit status: ${status}"
464+
if [ $status -ne 0 ]; then
465+
echo "Startup script failed, exiting..."
466+
exit $status
467+
fi
404468
```
405469

470+
> **Note:** We don't use `set -x` here because we're manually echoing the commands. This protects against sensitive information being shown in the log.
471+
472+
This script tells us what command is being run and what the exit status is. If the exit status is non-zero, it means the command failed and we exit the script. Since we are manually checking the exit status here, we don't need `set -e` at the top of the script to exit on error.
473+
406474
## Template permissions (enterprise)
407475

408476
Template permissions can be used to give users and groups access to specific

0 commit comments

Comments
 (0)