Skip to content

feat(coderd/healthcheck): allow configuring database hc threshold #10623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 13, 2023

Conversation

johnstcn
Copy link
Member

@johnstcn johnstcn commented Nov 10, 2023

Fixes #9381:

  • Exposes database healthcheck threshold in configuration

Also takes care of part of #8969:

  • Exposes healthcheck recheck interval in configuration

@johnstcn johnstcn self-assigned this Nov 10, 2023
Comment on lines 22 to 23
LatencyMs int64 `json:"latency_ms"`
ThresholdMs int64 `json:"threshold_ms"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self-review: I can make these ints if preferred, but int64 is slightly less annoying to use here.

Comment on lines -52 to +50
DB database.Store
// TODO: support getting this over HTTP?
DERPMap *tailcfg.DERPMap
AccessURL *url.URL
Client *http.Client
APIKey string
AccessURL AccessURLReportOptions
Database DatabaseReportOptions
DerpHealth derphealth.ReportOptions
Websocket WebsocketReportOptions
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self-review: Refactored this struct to directly pass in the individual checker options.

coderd/coderd.go Outdated
Comment on lines 419 to 423
if options.HealthcheckTimeout == 0 {
options.HealthcheckTimeout = 30 * time.Second
options.HealthcheckTimeout = options.DeploymentValues.Healthcheck.Timeout.Value()
}
if options.HealthcheckRefresh == 0 {
options.HealthcheckRefresh = 10 * time.Minute
options.HealthcheckRefresh = options.DeploymentValues.Healthcheck.Refresh.Value()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self-review: Also plumbed through the healthcheck timeout and refresh and made them configurable. Addresses part of #8969 but I can break this into a separate PR if preferred.

@johnstcn johnstcn marked this pull request as ready for review November 10, 2023 17:10
@johnstcn johnstcn requested review from deansheather, coadler, mafredri and mtojek and removed request for coadler and deansheather November 10, 2023 17:10
@matifali
Copy link
Member

We should also show a hint on the database health check page on how to configure this interval.
Probably expose it somewhere on the deployment page too. We are currently exposing many of the configured options there.

@johnstcn
Copy link
Member Author

johnstcn commented Nov 13, 2023

@matifali We should also show a hint on the database health check page on how to configure this interval. Probably expose it somewhere on the deployment page too. We are currently exposing many of the configured options there.

Right now they'll show up by default under Deployment > Observability. I can move them to a separate option group and move them to a separate page if required.

image

I might need some design help with adding the hint to the health page though, so I'll defer that to a separate PR.

@matifali
Copy link
Member

I think this is fine for now. A hint could be added in a separate PR.

Thank you.

Reachable bool `json:"reachable"`
Latency string `json:"latency"`
LatencyMS int64 `json:"latency_ms"`
ThresholdMS int64 `json:"threshold_ms"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add the Threshold string for consistency? I don't know what is the original though behind string+int64 pairs.

Copy link
Member Author

@johnstcn johnstcn Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me either, and the fact that I don't know why this particular fence exists makes me pause before knocking it down. I do know that _ms was added post-hoc for the UI (8ee500c) so I suspect it was simply a case of not renaming a field in the API for compatibility purposes.

Instead of parsing the Go string representation (latency) it's probably better for the UI to just create the durations from millis and then display them however is deemed best. So I would argue in favour of leaving out threshold entirely.

@mtojek mtojek self-requested a review November 13, 2023 13:45
@johnstcn johnstcn requested a review from mafredri November 13, 2023 13:45
Copy link
Member

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in good shape to get merged 👍

@johnstcn johnstcn merged commit b69c237 into main Nov 13, 2023
@johnstcn johnstcn deleted the cj/configurable-db-hc-threshold branch November 13, 2023 14:14
@github-actions github-actions bot locked and limited conversation to collaborators Nov 13, 2023
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

# database exceeds this threshold over 5 attempts, the database is considered
# unhealthy. The default value is 15ms.
# (default: 15ms, type: duration)
thresholdDatabase: 15ms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be smoother as databaseThreshold vs thresholdDatabase (same with flags, etc)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

healthcheck: configurable database delay
4 participants