Skip to content

feat: modify coordinators to send errors and peers to log them #17467

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 21, 2025

Conversation

spikecurtis
Copy link
Contributor

@spikecurtis spikecurtis commented Apr 18, 2025

Adds support to our coordinator implementations to send Error updates before disconnecting clients.

I was recently debugging a connection issue where the client was getting repeatedly disconnected from the Coordinator, but since we never send any error information it was really hard without server logs.

This PR aims to correct that, by sending a CoordinateResponse with Error set in cases where we disconnect a client without them asking us to.

It also logs the error whenever we get one in the client controller.

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@spikecurtis spikecurtis marked this pull request as ready for review April 18, 2025 11:53
@spikecurtis spikecurtis force-pushed the spike/send-and-log-coordinate-errors branch from b7d0071 to 9cc149f Compare April 18, 2025 11:55
// ReadyForHandshake error can occur during race conditions, where we send a ReadyForHandshake message,
// but the source has already disconnected from the tunnel by the time we do. So, just log at warning.
if strings.HasPrefix(resp.Error, ReadyForHandshakeError) {
c.logger.Warn(context.Background(), "coordination warning", slog.F("msg", resp.Error))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit fragile in that it's special handling code for a specific error. But, I'm not sure we want to add more structure to the coordination protocol error messages for just one odd-duck error message.

Without this, many tests will flake due to dropping a benign error message.

@ethanndickson what do you think?

Copy link
Member

@ethanndickson ethanndickson Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was indeed missing those log detected at level ERRORs...

All those test failures are in CLI integration tests. We already ignore error logs on coderdtest for this same reason, is it feasible to ignore agent error logs too (and only in integration tests)?

If not, this sounds fine to me as a warning.

Copy link
Contributor Author

@spikecurtis spikecurtis Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's feasible, but I worry it's overbroad.

The ready for handshake "errors" aren't "errors" by the criteria we usually use for logs, which is that we definitely know something bad happened, rather than "something bad might have happened, or it might be fine" which is where we usually put WARN logs.

So, I think this behavior is more correct, I'm just not wild about the implementation. But, also not wild about alternative implementations that make the error vs warning explicit in the protocol.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to write as a one-off. If we ever need to communicate more than one type of warning using the protocol, then maybe we reconsider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I was thinking. If we have to start special casing more than one, it will be worth while to make it explicit in the protocol.

@spikecurtis spikecurtis merged commit 345435a into main Apr 21, 2025
35 checks passed
@spikecurtis spikecurtis deleted the spike/send-and-log-coordinate-errors branch April 21, 2025 07:40
@github-actions github-actions bot locked and limited conversation to collaborators Apr 21, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants