Skip to content

Add health-check for coder_apps #2662

Closed
@sharkymark

Description

@sharkymark

edited by @bpmct

Problem statement

Throughout Coder's documentation and examples, the startup_script is used to install web IDEs onto the workspace, such as code-server, Jetbrains Projector, JupyterLab, etc. From there, users connect via links in the dashboard. In the template, this is defined via the coder_app resource.

When the workspace starts, it takes 15-60s for the IDEs to install before a user can get to the page. When they click it before the app loads, there's a 404 page:

Screen Shot 2022-08-01 at 5 00 55 PM

However, when you refresh 30 seconds later, it works!

Screen Shot 2022-08-01 at 5 02 13 PM

Definition of done

From the dashboard, the app cannot be opened until the health check passes, or the app is eventually deemed unhealthy.

Prior art

Health checks are implemented for generic apps in Coder Classic with support for exec and http based health checks, but with a hardcoded interval/timeout/unhealthy threshold. There is not a loading indicator in the Ui, but when an app is clicked, the tab loads for x seconds until the health check passes/fails.

Ideas

@bpmct: Unlike health checks in Coder Classic, I think health checks would benefit from a configurable unhealthy threshold since applications will often be installed during runtime, leading to longer-than-normal "wait times." GCP follows this pattern

  • Some apps may depend on a process starting (e.g code-server, http.server) so it can be considered unhealthy in 15 seconds
  • Some apps may depend on IntelliJ downloading and could take 60+ seconds

Add health check to coder_app resource

resource "coder_app" "code_server" {
   # ...
+  health_check {
+     # actual schema TBD
+     enabled = true
+     unhealthy_threshold = "60s"
+   }
}

Before the unhealthy threshold, a loading indicator could be present making it clear to users the app is still unhealthy/loading until the health check passes or times out. Perhaps the app is also unclickable

Screen Shot 2022-08-01 at 5 27 45 PM

After the threshold is exceeded (e.g 3 mins), the app can have a red/error indicator if the health check never passes:

Screen Shot 2022-08-01 at 5 30 55 PM.

Metadata

Metadata

Assignees

Labels

apiArea: HTTP APIsiteArea: frontend dashboard

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions