-
Notifications
You must be signed in to change notification settings - Fork 889
feat(coderd/healthcheck): allow configuring database hc threshold #10623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
coderd/healthcheck/database.go
Outdated
LatencyMs int64 `json:"latency_ms"` | ||
ThresholdMs int64 `json:"threshold_ms"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self-review: I can make these int
s if preferred, but int64 is slightly less annoying to use here.
DB database.Store | ||
// TODO: support getting this over HTTP? | ||
DERPMap *tailcfg.DERPMap | ||
AccessURL *url.URL | ||
Client *http.Client | ||
APIKey string | ||
AccessURL AccessURLReportOptions | ||
Database DatabaseReportOptions | ||
DerpHealth derphealth.ReportOptions | ||
Websocket WebsocketReportOptions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self-review: Refactored this struct to directly pass in the individual checker options.
coderd/coderd.go
Outdated
if options.HealthcheckTimeout == 0 { | ||
options.HealthcheckTimeout = 30 * time.Second | ||
options.HealthcheckTimeout = options.DeploymentValues.Healthcheck.Timeout.Value() | ||
} | ||
if options.HealthcheckRefresh == 0 { | ||
options.HealthcheckRefresh = 10 * time.Minute | ||
options.HealthcheckRefresh = options.DeploymentValues.Healthcheck.Refresh.Value() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self-review: Also plumbed through the healthcheck timeout and refresh and made them configurable. Addresses part of #8969 but I can break this into a separate PR if preferred.
We should also show a hint on the database health check page on how to configure this interval. |
Right now they'll show up by default under Deployment > Observability. I can move them to a separate option group and move them to a separate page if required. I might need some design help with adding the hint to the health page though, so I'll defer that to a separate PR. |
I think this is fine for now. A hint could be added in a separate PR. Thank you. |
Reachable bool `json:"reachable"` | ||
Latency string `json:"latency"` | ||
LatencyMS int64 `json:"latency_ms"` | ||
ThresholdMS int64 `json:"threshold_ms"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add the Threshold string
for consistency? I don't know what is the original though behind string+int64 pairs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me either, and the fact that I don't know why this particular fence exists makes me pause before knocking it down. I do know that _ms
was added post-hoc for the UI (8ee500c) so I suspect it was simply a case of not renaming a field in the API for compatibility purposes.
Instead of parsing the Go string representation (latency
) it's probably better for the UI to just create the durations from millis and then display them however is deemed best. So I would argue in favour of leaving out threshold
entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in good shape to get merged 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
# database exceeds this threshold over 5 attempts, the database is considered | ||
# unhealthy. The default value is 15ms. | ||
# (default: 15ms, type: duration) | ||
thresholdDatabase: 15ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be smoother as databaseThreshold
vs thresholdDatabase
(same with flags, etc)
Fixes #9381:
Also takes care of part of #8969: