"Running" on workspace list isn't helpful if the agent is disconnected #6461

bpmct · 2023-03-06T15:40:49Z

Could we have a combined state which displays instead whether the agent is connected, if it's still running the startup script, whether it's reachable/unreachable, etc?

mafredri · 2023-03-07T12:03:45Z

I'm thinking about how we should represent this in the UI/CLI. What makes most sense to me is to have a workspace health field since we still want to differentiate between a started/stopped state. The two columns could essentially be combined into one since health for a stopped workspace doesn't make much sense, but this should illustrate the general idea.

Before:

After:

This is a bit tricker to express in the CLI, since we don't have popups to give extra information.

Perhaps we can simply add a new column:

❯ coder list
WORKSPACE      TEMPLATE  STATUS   Health     LAST BUILT  OUTDATED  STARTS AT                         STOPS AFTER
mafredri/test  docker    Stopped  N/A        21d23h      true      -                                 8h
mafredri/work  docker    Started  Unhealthy  1d          false     9:30AM Sun-Sat (Europe/Helsinki)  14h

The user can then use show to find out why it's unhealthy:

❯ coder show work
┌──────────────────────────────────────────────────────────────────────────────────────────────┐
│ RESOURCE                    STATUS       LIFECYCLE       VERSION          ACCESS             │
├──────────────────────────────────────────────────────────────────────────────────────────────┤
│ docker_container.workspace                                                                   │
│ └─ main (linux, amd64)      ⦿ connected  x startup_failed  v0.18.1+e3a4861   coder ssh work  │
├──────────────────────────────────────────────────────────────────────────────────────────────┤
│ docker_image.workspace                                                                       │
├──────────────────────────────────────────────────────────────────────────────────────────────┤
│ docker_image.base                                                                            │
├──────────────────────────────────────────────────────────────────────────────────────────────┤
│ docker_image.go                                                                              │
├──────────────────────────────────────────────────────────────────────────────────────────────┤
│ docker_image.template                                                                        │
├──────────────────────────────────────────────────────────────────────────────────────────────┤
│ docker_volume.home_volume                                                                    │
└──────────────────────────────────────────────────────────────────────────────────────────────┘

bpmct · 2023-03-13T04:21:50Z

I'm definitely in favor of more of a single lifecycle state being displayed to users

BrunoQuaresma · 2023-03-13T17:03:07Z

We could do the following:

Show a green badge with the label "Healthy" when the workspace is ok and all the agents are ok too.
Show a red or orange bad with the label "Unhealthy" when the workspace is not ok or the agents are not running. When the user hovers over the badge we display a popover showing the agents and their statuses. We could also add a message like "Looks like the workspace is not running or its agents are failing. You may solve this by restarting the workspace."

bpmct · 2023-03-14T17:25:21Z

When the user hovers over the badge we display a popover showing the agents and their statuses. We could also add a message like "Looks like the workspace is not running or its agents are failing. You may solve this by restarting the workspace."

I'm not a fan of hovering to see additional error details as every user may not intuitively hover. For example, the agent troubleshooting URL is hard to find. On the workspace list page, that is fair, but I think on the page itself we should promote restarting with a bug button.

BrunoQuaresma · 2023-06-21T12:13:40Z

@mafredri it looks like a FE task, is there anything you want to do in the BE? Or are you interested on take the FE as well?

BrunoQuaresma · 2023-06-21T12:18:14Z

I think the task #6462 is related to it as well

mafredri · 2023-06-23T11:02:25Z

it looks like a FE task, is there anything you want to do in the BE? Or are you interested on take the FE as well?

@BrunoQuaresma happy to defer FE to you. As for the BE, I think potential requirements may be easier to identify after we have an implementation of this. For example, do we want a single field on the agent that says health: helathy/unhealthy/etc to simplify the FE logic, etc?

BrunoQuaresma · 2023-06-23T12:36:40Z

@mafredri this would be great!

mafredri · 2023-06-28T11:22:57Z

Alright @BrunoQuaresma and @bpmct, we can add a health field to the agent, and another one to the workspace (which is a summary of the agent health fields).

We could start off with a simple enum: healthy and unhealthy and then add more states as we need them.

The only question in my mind is, how can the UI help the user understand why a workspace is unhealthy based on the new health field? Should we add more values to the enum or would we just end up reimplementing the status/lifecycle state fields? Then again @BrunoQuaresma's proposal is good in that this doesn't matter and the simple state is enough.

To document my thought process, and possibly prompt some ideas from all of you. Here are the (currently possible) enumerated states of an agent (status:lifecycle state):

connecting:created
timeout:created
connected:created
connected:starting
(connected:start_timeout)
connected:start_error
connected:ready
connected:shutting_down
(connected:shutdown_timeout)
connected:shutdown_error
connected:off
disconnected:created
disconnected:starting
(disconnected:start_timeout)
disconnected:start_error
disconnected:ready
disconnected:shutting_down
(disconnected:shutdown_timeout)
disconnected:shutdown_error
disconnected:off

States that I would consider are healthy:

connecting:created
connected:created
connected:starting
connected:ready

There's one state representing a timeout or an unknown state, we don't know if it's healthy or unhealthy, nor do we know if it's ever going to be:

timeout:created

There are two states representing soft timeouts, these could be considered either healthy or unhealthy:

(connected:start_timeout)
(connected:shutdown_timeout)

The rest could all be considered unhealthy.

If we ever want to support restarting agents on request, even these states could be considered healthy:

connected:shutting_down
connected:off
disconnected:off

bpmct · 2023-06-29T10:32:56Z

I don't think we should add more values to the enum. Displaying healthy/unhealthy will be easier for the user to understand IMO.

However, I think we could surface the "reason" as a different property in the API request, perhaps one that shows up when you hover over "unhealthy." I think https://dev.coder.com/api/v2/debug/health has a good design for this.

I assume we're not storing health information in the DB, right? Just computing it per-request based on workspace/agent state?

In unhealthy scenarios (e.g. agent disconnected), we should offer a button to restart the workspace.

Hope that helps!

bpmct · 2023-06-29T10:37:33Z

Also, I'm more than happy to help with the user-facing tooltip/coder show copy for each of those states you mentioned. Let me know what you think.

mafredri · 2023-06-30T16:34:53Z

I created a draft implementation of the health field, borrowing concepts from /debug/health, in #8280. Thoughts?

I still need to think some more about how this field will behave when e.g. the workspace is stopping. Stopping is technically healthy, but the workspace could be considered "unhealthy" from a usability perspective.

Fixes #6461

mafredri · 2023-07-10T10:59:40Z

@BrunoQuaresma I created a draft for this, but could use some guidance for the popover/tooltip: #8387. Also, does this align with your vision?

mtojek · 2023-07-17T07:38:16Z

Hey @mafredri and @BrunoQuaresma! I have just seen this issue, so I'm wondering if we can close this one. Otherwise, could you please document what is left?

mafredri · 2023-07-17T08:18:54Z

@mtojek I'd say we still want to improve coder list to show health status.

With coder list, we can't do the same as the WebUI (add an exclamation mark to status), so a separate healthy/unhealthy column might be ideal.

There's also coder show [workspace], but it's whole UI needs to be changed (IMO). It's currently a "pretty table" which we've mostly moved away from, and I think it should show health and lifecycle state too. This could be a separate issue.

Ref #6461

Fixes #6461

mafredri self-assigned this Mar 7, 2023

bpmct mentioned this issue Mar 7, 2023

Track startup script time + display estmate #6502

Closed

kconley-sq mentioned this issue Jun 5, 2023

Coder Server Doesn't Detect Dead Docker Workspace #7783

Closed

matifali added bug:px:needs-triage labels Jun 6, 2023

ammario removed the bug:px:needs-triage label Jun 6, 2023

matifali added the s2 Broken use cases or features (with a workaround). Only humans may set this. label Jun 26, 2023

mafredri mentioned this issue Jun 30, 2023

feat: add computed workspace and agent health fields to the api #8280

Merged

mafredri mentioned this issue Jul 10, 2023

feat(site): add workspace health badge to workspace list #8387

Closed

2 tasks

mafredri added a commit that referenced this issue Jul 10, 2023

feat(site): add workspace health badge to workspace list

597da97

Fixes #6461

mafredri mentioned this issue Jul 17, 2023

feat(cli): show workspace health in list #8541

Merged

mafredri added a commit that referenced this issue Jul 17, 2023

feat(cli): show workspace health in list

be55dcd

Ref #6461

mafredri added a commit that referenced this issue Jul 17, 2023

feat(cli): show workspace health in list (#8541)

dfac074

Ref #6461

mafredri mentioned this issue Jul 17, 2023

feat(cli): show workspace health in show-command #8548

Merged

mafredri added a commit that referenced this issue Jul 17, 2023

feat(cli): show workspace health in show

044e23e

Fixes #6461

mafredri closed this as completed in #8548 Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Running" on workspace list isn't helpful if the agent is disconnected #6461

"Running" on workspace list isn't helpful if the agent is disconnected #6461

bpmct commented Mar 6, 2023

mafredri commented Mar 7, 2023

bpmct commented Mar 13, 2023

BrunoQuaresma commented Mar 13, 2023

bpmct commented Mar 14, 2023

BrunoQuaresma commented Jun 21, 2023

BrunoQuaresma commented Jun 21, 2023 •

edited

Loading

mafredri commented Jun 23, 2023

BrunoQuaresma commented Jun 23, 2023

mafredri commented Jun 28, 2023

bpmct commented Jun 29, 2023 •

edited

Loading

bpmct commented Jun 29, 2023

mafredri commented Jun 30, 2023

mafredri commented Jul 10, 2023

mtojek commented Jul 17, 2023

mafredri commented Jul 17, 2023 •

edited

Loading

"Running" on workspace list isn't helpful if the agent is disconnected #6461

"Running" on workspace list isn't helpful if the agent is disconnected #6461

Comments

bpmct commented Mar 6, 2023

mafredri commented Mar 7, 2023

bpmct commented Mar 13, 2023

BrunoQuaresma commented Mar 13, 2023

bpmct commented Mar 14, 2023

BrunoQuaresma commented Jun 21, 2023

BrunoQuaresma commented Jun 21, 2023 • edited Loading

mafredri commented Jun 23, 2023

BrunoQuaresma commented Jun 23, 2023

mafredri commented Jun 28, 2023

bpmct commented Jun 29, 2023 • edited Loading

bpmct commented Jun 29, 2023

mafredri commented Jun 30, 2023

mafredri commented Jul 10, 2023

mtojek commented Jul 17, 2023

mafredri commented Jul 17, 2023 • edited Loading

BrunoQuaresma commented Jun 21, 2023 •

edited

Loading

bpmct commented Jun 29, 2023 •

edited

Loading

mafredri commented Jul 17, 2023 •

edited

Loading