Skip to content

feat(coderd): add provisioner_daemons to /debug/health endpoint #11393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jan 8, 2024

Conversation

johnstcn
Copy link
Member

@johnstcn johnstcn commented Jan 3, 2024

Adds a healthcheck for provisioner daemons to /debug/health endpoint.

Part of #10676

@johnstcn johnstcn self-assigned this Jan 3, 2024
@johnstcn johnstcn force-pushed the cj/provisionerd-healthcheck branch from 1b2bc9a to 53ed901 Compare January 4, 2024 16:33
@johnstcn johnstcn changed the base branch from main to cj/util-apiversion January 4, 2024 16:34
Base automatically changed from cj/util-apiversion to main January 5, 2024 10:22
@johnstcn johnstcn force-pushed the cj/provisionerd-healthcheck branch from 53ed901 to b83013b Compare January 5, 2024 12:19
@johnstcn johnstcn changed the title WIP: add version healthcheck for provisioner daemons feat(coderd): add provisioner_daemons to /debug/health endpoint Jan 5, 2024
@johnstcn johnstcn marked this pull request as ready for review January 5, 2024 12:31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you understand why this keeps getting marked as changed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the timestamp on the files changing; I haven't yet figured out a way to ignore this.

continue
}
// Daemon has gone away, skip.
if now.Sub(daemon.LastSeenAt.Time) > (opts.StaleInterval) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud, is it possible that a daemon reports an error, but after that, the last seen isn't updated. Then after stale interval the error disappears, and perhaps nobody ever notices it?

Also, don't we want to apply the same rules to r.ProvisionerDaemons?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible that a daemon reports an error, but after that, the last seen isn't updated. Then after stale interval the error disappears, and perhaps nobody ever notices it?

Yes, and this is by design. If a provisioner daemon connects at some point, has a transient error, and then never heartbeats, I don't think it makes sense to warn about this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I was entertaining the possibility of some error state resulting in the provisioner exiting, crashing or plain stopping communication with server. But I didn't have anything concrete in mind, so this is fine.

ProvisionerDaemons []codersdk.ProvisionerDaemon `json:"provisioner_daemons"`
}

type ProvisionerDaemonsReportDeps struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels more like Config instead of Deps, but I'll leave it up to you.

@johnstcn johnstcn force-pushed the cj/provisionerd-healthcheck branch from 62d42e8 to 19e896a Compare January 8, 2024 09:10
@johnstcn johnstcn merged commit 04fd96a into main Jan 8, 2024
@johnstcn johnstcn deleted the cj/provisionerd-healthcheck branch January 8, 2024 09:29
@github-actions github-actions bot locked and limited conversation to collaborators Jan 8, 2024
@mtojek
Copy link
Member

mtojek commented Jan 8, 2024

@johnstcn Nice contribution! Could you please link it with the umbrella issue and fill in the PR description?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants