-
Notifications
You must be signed in to change notification settings - Fork 888
feat: show health check warnings #10620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
site/src/testHelpers/entities.ts
Outdated
}, | ||
database: { | ||
healthy: true, | ||
reachable: true, | ||
latency: 92570, | ||
error: null, | ||
latency: "92570", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get rid of this field? (not in this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many places in this API response are discussable. I second the question.
// Healthy is true if the report returns no errors. | ||
Healthy bool `json:"healthy"` | ||
// Warning is true when Coder is operational but its performance might be impacted. | ||
Warning bool `json:"warning"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we get a report that has both { Healthy: false, Warning: true }
?
As far as I understand, that's an invalid state in our current model, which should not be representible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, but I'm afraid that it will not be achievable without breaking API? I introduced the "warning" field to be perceived as "it is healthy, but...".
To solve it properly, most likely we need to replace "Healthy" with enum, for instance: status: healthy
, status: degraded
, status: error
, etc.
I'm happy to chat about this as it is the right moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I think about it, it could be useful to have both health and warn information, as they are not necessarily completely separate.
For example, if you're using embedded database and access URL just points straight to the VM, and you get a failed healthcheck for database taking too long and warnings for access URL taking too long to respond, that can already give you an indication that perhaps the server is overloaded.
Maybe we could add
Warnings []string or map[string]string
at the top level, and any warnings from component reports get added in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this concept if only we present these warnings on the "Health" page 👍 Otherwise, it is pointless. It also means that the Health page will become more informative than just Healthy/Unhealthy + JSON dump.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR touches few important aspects we may want to discuss:
Warning bool
vs.Warning []string
(Cian) vs "status enum". I don't have any preference, forWarning bool
seems to be sufficient, but I don't know where we would like to go with the design. I wouldn't like to drive a new RFC just because of that.- Can we move Healthcheck API response to
codersdk
? - So, can "site" use API responses from the
codersdk
? Right now, it uses its own custom implementation. Unfortunately, it seems to be incorrect/not fully implemented.
I will ping a couple of folks so that we have an opinionated design.
site/src/testHelpers/entities.ts
Outdated
@@ -2456,7 +2459,8 @@ export const MockHealth = { | |||
TokenBucketBytesBurst: 0, | |||
}, | |||
can_exchange_messages: true, | |||
round_trip_ping: 7674330, | |||
round_trip_ping: "7674330", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as Cian posted for "latency". Is it backward compatibility?
site/src/testHelpers/entities.ts
Outdated
@@ -2467,18 +2471,16 @@ export const MockHealth = { | |||
], | |||
], | |||
client_errs: [[], []], | |||
error: null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does UI/CLI/something use "error" fields?
// A node may use websockets because `Upgrade: DERP` may be blocked on | ||
// the load balancer. This is unhealthy because websockets are slower | ||
// than the regular DERP protocol. | ||
r.Warning = r.UsesWebsocket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the most important logical change. Using WebSockets won't be considered to be a failure anymore.
site/src/api/api.ts
Outdated
export const getHealth = async () => { | ||
const response = await axios.get<Health>("/api/v2/debug/health"); | ||
const response = await axios.get<TypesGen.HealthcheckReport>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BrunoQuaresma Do you know why it uses Health
instead of TypesGen.HealthcheckReport
? Is there any strong reason why it does not use api/api
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted it here: #10650
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember but this is a good improvement for sure.
site/src/api/api.ts
Outdated
export const getHealth = async () => { | ||
const response = await axios.get<Health>("/api/v2/debug/health"); | ||
const response = await axios.get<TypesGen.HealthcheckReport>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BrunoQuaresma It seems that make gen
does not generate DerphealthcheckReport
even though it is marked to be generated. Do you have any idea why it happens?
@@ -2097,7 +2097,9 @@ export interface HealthcheckReport {
readonly healthy: boolean;
readonly warning: boolean;
readonly failing_sections: string[];
- readonly derp: DerphealthReport;
+ // Named type "github.com/coder/coder/v2/coderd/healthcheck/derphealth.Report" unknown, using "any"
+ // eslint-disable-next-line @typescript-eslint/no-explicit-any -- External type
+ readonly derp: any;
readonly access_url: HealthcheckAccessURLReport;
readonly websocket: HealthcheckWebsocketReport;
readonly database: HealthcheckDatabaseReport
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted it here: #10650
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the FE perspective, it looks good
I'm going to redo this according to the redesign notion doc. |
Fixes: #9470
This PR extends the
/api/v2/debug/health
API response to includewarning bool
field. Coder will not report "unhealthy" status if there are troubles with Websockets.