Skip to content

feature: provide a way to view all pending or running provisioner jobs #15084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Tracked by #15428
johnstcn opened this issue Oct 15, 2024 · 20 comments
Open
Tracked by #15428

feature: provide a way to view all pending or running provisioner jobs #15084

johnstcn opened this issue Oct 15, 2024 · 20 comments
Assignees
Labels
api Area: HTTP API need-backend Issues that need backend work

Comments

@johnstcn
Copy link
Member

johnstcn commented Oct 15, 2024

Motivation

Relates to #15047

There is currently no way to get a top-level view of all pending or running provisioner jobs.
This is useful for administrators troubleshooting issues with hung or pending jobs in combination with tagged provisioner daemons. Having this available would have been helpful in surfacing the root cause of the linked issue.

Proposed Solution

  1. Add an API endpoint that lists all provisioner jobs. We should allow filtering at minimum by organization, status (pending, running, completed, failed) and time (created_at). In the case of running, completed, or failed jobs, also include information on the provisioners that handled the job.

Required permissions: read all templates in org, read all workspaces in org.

  1. Add a CLI command that queries the above endpoint. We may also potentially expose a command to list all registered provisioners.

  2. Add a UI endpoint to list all provisioner jobs. We should have enough contextual information available to link to associated workspace builds and template versions.

@johnstcn johnstcn added api Area: HTTP API need-backend Issues that need backend work labels Oct 15, 2024
@bpmct
Copy link
Member

bpmct commented Oct 18, 2024

Can we also add the option to "force cancel" a job that is hanging? @johnstcn mentioned we have some existing endpoints

https://coder.com/docs/reference/api/templates#cancel-template-version-by-id
https://coder.com/docs/reference/api/templates#cancel-template-version-dry-run-by-job-id
https://coder.com/docs/reference/api/builds#cancel-workspace-build

but the ability to click "cancel" or "delete" for a job with no matching provisioner on this page would be awesome. Happy to extract to another issue if needed.

@johnstcn johnstcn added the design needed Request for more beauty label Oct 21, 2024
@johnstcn
Copy link
Member Author

Hey @chrifro , would you have some bandwidth for a quick UI mockup for part 3 of the above? It doesn't have to be anything overly fancy; even a simple table will probably suffice.

@chrifro
Copy link

chrifro commented Oct 23, 2024

Hey @chrifro , would you have some bandwidth for a quick UI mockup for part 3 of the above? It doesn't have to be anything overly fancy; even a simple table will probably suffice.

Sure, I can help @johnstcn. But I need a bit more context first. I'm not fully familiar with the provisioner setup yet.

  • Where would you like to see the table of the provisioner jobs? Right now, they exist under organizations [1] but you'd like to view all of them in a central place, right? Would a new tab under "deployment" make sense for that?
  • Any other information that should be shown in the table besides: "filtering at minimum by organization, status (pending, running, completed, failed) and time (created_at). In the case of running, completed, or failed jobs, also include information on the provisioners that handled the job. Workspace builds and template versions." That's all?

screenshot of the current provisioner information on organization level
Image

@johnstcn
Copy link
Member Author

you'd like to view all of them in a central place, right? Would a new tab under "deployment" make sense for that?

That makes sense!

Any other information that should be shown in the table

I think that covers it for now. But it actually might make sense to defer this until we have a corresponding CLI command defined, as that would probably inform us of any information I've missed. What do you think?

@chrifro
Copy link

chrifro commented Oct 23, 2024

Perfect, then I will draft something based on that and we can always iterate on it and adjust the shown information.

Does it work if I have something by end of week/early next week?

@johnstcn
Copy link
Member Author

Does it work if I have something by end of week/early next week?

That sounds fine to me!

@chrifro
Copy link

chrifro commented Oct 25, 2024

Here is a first draft. Note: the design is not final yet. I'd like to align on the information structure and direction first. Is this what you had in mind? Are there any information missing? Feel free to comment directly in Figma.

include information on the provisioners that handled the job

Could you provide an example? What kind of information would you like to see here?

Image

@mafredri
Copy link
Member

mafredri commented Oct 28, 2024

@johnstcn I've drafted up an example of the outputs for the CLI commands, do these match the expected behavior and is there any missing/excess data?

coder provisioner jobs list
CREATED AT            JOB                                   ORGANIZATION  STATUS     TYPE                     QUEUE  COMPLETED BY PROVISIONER            TAGS       AVAILABLE PROVISIONERS
2024-10-19T16:22:39Z  9739dfd4-d69e-4369-9eed-f6e8a0c5cb57  Coder         completed  workspace_build                 coder-provisioner-789964695d-48cw6             []
2024-10-24T16:27:56Z  6976d3c0-64b1-4ace-8c5f-6bb620fae1ee  Coder         failed     workspace_build                 coder-provisioner-789964695d-k285t  [foo=bar]  []
2024-10-26T16:28:20Z  b1547e9d-502b-47a3-ba0b-b05e56825b36  Coder         pending    template_version_import                                             [baz=qux]  []
2024-10-26T16:29:26Z  d0eb5f9f-81c7-4042-896e-055d095ae9cc  Coder         pending    template_version_import  1/1                                                   [coder-provisioner-789964695d-48cw6,coder-provisioner-789964695d-k285t]

Note: I considered adding template version/workspace name, but it could be looked up by job ID instead.

coder provisioner list
NAME                                ORGANIZATION  STATUS        CURRENT JOB                           PREVIOUS JOB                          PREVIOUS_JOB_STATUS  JOBS LAST 24H  TYPE      CREATED AT            LAST SEEN             VERSION     TAGS
coder-provisioner-789964695d-48cw6  Coder         connected     d0eb5f9f-81c7-4042-896e-055d095ae9cc  9739dfd4-d69e-4369-9eed-f6e8a0c5cb57  completed            2              psk       2024-10-19T16:22:39Z  2024-10-26T16:29:26Z  up to date  []
coder-provisioner-789964695d-k285t  Coder         connected                                           6976d3c0-64b1-4ace-8c5f-6bb620fae1ee  failed               2              psk       2024-10-26T16:28:20Z  2024-10-26T16:29:26Z  up to date  [foo=bar]
coder-provisioner-789964695d-r822d  Coder         disconnected                                        66f8290b-2ac0-4fd4-aa16-c926d8b8b61c  completed            0              psk       2024-10-26T16:28:20Z  2024-10-26T16:29:26Z  up to date  [baz=qux]

The outputs are fairly wide, I'd love to reduce it, but I felt like all of this information is useful in trying to find potential issues.

@johnstcn
Copy link
Member Author

johnstcn commented Oct 28, 2024

@mafredri I think we can show basic information by default but provide additional optional columns:

  • For provisioner jobs list, we could move "completed by provisioner" and "available provisioners" to optional columns not shown by default.
  • Similarly, for provisioner list, we could probably make "previous job" and "previous job status" both optional columns. Maybe also "jobs last 24h", but dealer's choice.

Apart from that, the basic gist of the output looks good to me! 👍

@matifali
Copy link
Member

matifali commented Oct 28, 2024

Seeing the number of jobs in the queue for each provisioner will be nice too.

@mafredri
Copy link
Member

  • For provisioner jobs list, we could move "completed by provisioner" and "available provisioners" to optional columns not shown by default.

I think showing available provisioners could be a good default since it's what can tell you immediately if there's a problem or not. But maybe we just show a number, available provisioners = 2 instead of the list? Then it could probably be default enabled. An alternative way to show this could be to show ? / 0/0 / [loading symbol] in the queue column, but not a fan of the inference requirement.

  • Similarly, for provisioner list, we could probably make "previous job" and "previous job status" both optional columns. Maybe also "jobs last 24h", but dealer's choice.

I agree, maybe those columns are not even needed. We could add another column instead: last job = 3m ago, an increasing number here may help suggest something could be wrong.


Seeing a list of a number of jobs in the queue will be nice to have.

@matifali could you expand on what you're thinking of? Is it different from coder provisioner jobs list filtered on pending? Are you thinking CLI or Web UI? Or something else?

@bpmct
Copy link
Member

bpmct commented Nov 21, 2024

from sprint planning:

  • we can start with backend, then follow up with the web UI. we may not complete the frontend by the end of this sprint

@matifali
Copy link
Member

@matifali could you expand on what you're thinking of? Is it different from coder provisioner jobs list filtered on pending? Are you thinking CLI or Web UI? Or something else?

What I mean is to report the total number of jobs in queue. It would make better sense in UI and can be skipped in CLI.

@bpmct
Copy link
Member

bpmct commented Jan 2, 2025

@mafredri @chrifro It'd make sense to visualize the queue in the UI for cases where a provisioner is not ready pick up a job, or if there are no compatible provisioners running. Since 1 provisioner can only pick up 1 job at a time, not sure how the drill-down by provisioner would work.

The CLI output totally makes sense to me. Maybe the UI can mirror the same logical structure?

@mafredri
Copy link
Member

mafredri commented Jan 2, 2025

@bpmct that makes sense to me, provisioners and provisioner jobs are intertwined but also distinct, having two separate views would be ideal. 👍🏻

@bartekgatzcoder
Copy link
Contributor

@bpmct does showing a queue of jobs for provisioners also imply that the admin can "force cancel" a queued job, too (despite it being perfectly healthy, at least in theory)?

@bpmct
Copy link
Member

bpmct commented Jan 2, 2025

Yep! I imagine that could be a quick prerequisite. Jobs that are assigned to a provisioner can be cancelled but I don't think we actually support cancelling "pending" jobs that have been unassigned can be deleted/cancelled, but we have had many requests for it.

@matifali
Copy link
Member

matifali commented Jan 3, 2025

I don't think we actually support cancelling "pending" jobs that have been unassigned can be deleted/cancelled, but we have had many requests for it.

Yes. See: #12331

@chrifro
Copy link

chrifro commented Jan 3, 2025

It'd make sense to visualize the queue in the UI for cases where a provisioner is not ready pick up a job, or if there are no compatible provisioners running.

By that do you mean adding more details to thepending status? Do you have an example of what kind of information should be shown there?

Based on this early mockup , what other changes would you like to see? What kind of information is still missing?

mafredri added a commit that referenced this issue Jan 13, 2025
mafredri added a commit that referenced this issue Jan 14, 2025
mafredri added a commit that referenced this issue Jan 14, 2025
mafredri added a commit that referenced this issue Jan 14, 2025
mafredri added a commit that referenced this issue Jan 14, 2025
mafredri added a commit that referenced this issue Jan 14, 2025
mafredri added a commit that referenced this issue Jan 14, 2025
mafredri added a commit that referenced this issue Jan 14, 2025
mafredri added a commit that referenced this issue Jan 14, 2025
mafredri added a commit that referenced this issue Jan 15, 2025
mafredri added a commit that referenced this issue Jan 16, 2025
mafredri added a commit that referenced this issue Jan 16, 2025
mafredri added a commit that referenced this issue Jan 20, 2025
mafredri added a commit that referenced this issue Jan 27, 2025
aslilac pushed a commit that referenced this issue Jan 29, 2025
mafredri added a commit that referenced this issue Feb 6, 2025
This change adds metadata to provisioner jobs to help with rendering
related tempaltes and workspaces in the UI.

Updates #15084
mafredri added a commit that referenced this issue Feb 13, 2025
This change adds to new filters to the provisionerjobs endpoint, id
(array) and tags (map).

Updates #15084
Updates #15192
Related #16532
mafredri added a commit that referenced this issue Feb 14, 2025
…16558)

This change adds provisioner daemon ID filter to the provisioner daemons
endpoint, and also implements the limiting to 50 results.

Test coverage is greatly improved and template information for jobs
associated to the daemon was also fixed.

Updates #15084
Updates #15192
Related #16532
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Area: HTTP API need-backend Issues that need backend work
Projects
None yet
Development

No branches or pull requests

6 participants