Skip to content

feat: add derp mesh health checking in workspace proxies #12222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

deansheather
Copy link
Member

@deansheather deansheather commented Feb 20, 2024

Adds missing sibling replica healthcheck code (similar to replicasync package in the primary).

Reports errors via a callback (for tests), logs, the healthz-report endpoint, and registration requests.

Related to coder/customers#438

Copy link
Member Author

deansheather commented Feb 20, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @deansheather and the rest of your teammates on Graphite Graphite

@deansheather deansheather requested a review from coadler February 20, 2024 09:46
@deansheather deansheather marked this pull request as ready for review February 20, 2024 09:48
replicaPingSingleflight singleflight.Group
replicaErrMut sync.Mutex
replicaErr string
latestDERPMap atomic.Pointer[tailcfg.DERPMap]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the DERP stuff should be refactored out into a subcomponent of the proxy.

var (
wg sync.WaitGroup
mu sync.Mutex
failed = []string{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be easier to follow and more idiomatic if you made a chan error, and then wrote nil if successful and the error if not for each peer.

Then, you read from the channel until you get the expected number of peers and build up your error message. Avoids the mutex and waitgroup as well.

@github-actions github-actions bot added the stale This issue is like stale bread. label Mar 2, 2024
@github-actions github-actions bot closed this Mar 5, 2024
@deansheather deansheather reopened this Mar 5, 2024
@deansheather deansheather removed the stale This issue is like stale bread. label Mar 5, 2024
Base automatically changed from dean/proxy-derp-mesh-test to main March 5, 2024 07:40
},
})

// Register (but don't start) 5 other proxies in the same region. Since
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is confusing to me --- it looks like registerBrokenProxy starts an HTTP server, so in what sense do you mean "Register (but don't start)"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server is spun up but only responds with 500. We need to associate a unique IP:port with each replica, so instead of hoping that the port I picked doesn't have a TCP listener to it we just make a real server but cause all healthchecks to fail which has the same effect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment updated

@deansheather deansheather requested a review from spikecurtis March 8, 2024 05:39
Copy link
Contributor

@spikecurtis spikecurtis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@deansheather deansheather merged commit d2a5b31 into main Mar 8, 2024
@deansheather deansheather deleted the 02-20-feat_add_derp_mesh_health_checking_in_workspace_proxies branch March 8, 2024 06:31
@github-actions github-actions bot locked and limited conversation to collaborators Mar 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants