Description
edited by @bpmct
Problem statement
Throughout Coder's documentation and examples, the startup_script is used to install web IDEs onto the workspace, such as code-server, Jetbrains Projector, JupyterLab, etc. From there, users connect via links in the dashboard. In the template, this is defined via the coder_app resource.
When the workspace starts, it takes 15-60s for the IDEs to install before a user can get to the page. When they click it before the app loads, there's a 404 page:
However, when you refresh 30 seconds later, it works!
Definition of done
From the dashboard, the app cannot be opened until the health check passes, or the app is eventually deemed unhealthy.
Prior art
Health checks are implemented for generic apps in Coder Classic with support for exec
and http
based health checks, but with a hardcoded interval/timeout/unhealthy threshold. There is not a loading indicator in the Ui, but when an app is clicked, the tab loads for x
seconds until the health check passes/fails.
Ideas
@bpmct: Unlike health checks in Coder Classic, I think health checks would benefit from a configurable unhealthy threshold
since applications will often be installed during runtime, leading to longer-than-normal "wait times." GCP follows this pattern
- Some apps may depend on a process starting (e.g code-server, http.server) so it can be considered unhealthy in 15 seconds
- Some apps may depend on IntelliJ downloading and could take 60+ seconds
Add health check to coder_app
resource
resource "coder_app" "code_server" {
# ...
+ health_check {
+ # actual schema TBD
+ enabled = true
+ unhealthy_threshold = "60s"
+ }
}
Before the unhealthy threshold, a loading indicator could be present making it clear to users the app is still unhealthy/loading until the health check passes or times out. Perhaps the app is also unclickable
After the threshold is exceeded (e.g 3 mins), the app can have a red/error indicator if the health check never passes: