-
Notifications
You must be signed in to change notification settings - Fork 874
feat: modify coordinators to send errors and peers to log them #17467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b7d0071
to
9cc149f
Compare
// ReadyForHandshake error can occur during race conditions, where we send a ReadyForHandshake message, | ||
// but the source has already disconnected from the tunnel by the time we do. So, just log at warning. | ||
if strings.HasPrefix(resp.Error, ReadyForHandshakeError) { | ||
c.logger.Warn(context.Background(), "coordination warning", slog.F("msg", resp.Error)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit fragile in that it's special handling code for a specific error. But, I'm not sure we want to add more structure to the coordination protocol error messages for just one odd-duck error message.
Without this, many tests will flake due to dropping a benign error message.
@ethanndickson what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was indeed missing those log detected at level ERROR
s...
All those test failures are in CLI integration tests. We already ignore error logs on coderdtest
for this same reason, is it feasible to ignore agent error logs too (and only in integration tests)?
If not, this sounds fine to me as a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's feasible, but I worry it's overbroad.
The ready for handshake "errors" aren't "errors" by the criteria we usually use for logs, which is that we definitely know something bad happened, rather than "something bad might have happened, or it might be fine" which is where we usually put WARN logs.
So, I think this behavior is more correct, I'm just not wild about the implementation. But, also not wild about alternative implementations that make the error vs warning explicit in the protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to write as a one-off. If we ever need to communicate more than one type of warning using the protocol, then maybe we reconsider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's what I was thinking. If we have to start special casing more than one, it will be worth while to make it explicit in the protocol.
Adds support to our coordinator implementations to send Error updates before disconnecting clients.
I was recently debugging a connection issue where the client was getting repeatedly disconnected from the Coordinator, but since we never send any error information it was really hard without server logs.
This PR aims to correct that, by sending a CoordinateResponse with
Error
set in cases where we disconnect a client without them asking us to.It also logs the error whenever we get one in the client controller.