-
Notifications
You must be signed in to change notification settings - Fork 929
feat(coderd): add provisioner_daemons to /debug/health endpoint #11393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1b2bc9a
to
53ed901
Compare
53ed901
to
b83013b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you understand why this keeps getting marked as changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's the timestamp on the files changing; I haven't yet figured out a way to ignore this.
continue | ||
} | ||
// Daemon has gone away, skip. | ||
if now.Sub(daemon.LastSeenAt.Time) > (opts.StaleInterval) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud, is it possible that a daemon reports an error, but after that, the last seen isn't updated. Then after stale interval the error disappears, and perhaps nobody ever notices it?
Also, don't we want to apply the same rules to r.ProvisionerDaemons
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that a daemon reports an error, but after that, the last seen isn't updated. Then after stale interval the error disappears, and perhaps nobody ever notices it?
Yes, and this is by design. If a provisioner daemon connects at some point, has a transient error, and then never heartbeats, I don't think it makes sense to warn about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I was entertaining the possibility of some error state resulting in the provisioner exiting, crashing or plain stopping communication with server. But I didn't have anything concrete in mind, so this is fine.
ProvisionerDaemons []codersdk.ProvisionerDaemon `json:"provisioner_daemons"` | ||
} | ||
|
||
type ProvisionerDaemonsReportDeps struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels more like Config
instead of Deps
, but I'll leave it up to you.
62d42e8
to
19e896a
Compare
@johnstcn Nice contribution! Could you please link it with the umbrella issue and fill in the PR description? |
Adds a healthcheck for provisioner daemons to /debug/health endpoint.
Part of #10676