You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: prevent single replica proxies from staying unhealthy (#12641)
In the peer healthcheck code, when an error pinging peers is detected we
write a "replicaErr" string with the error reason. However, if there are
no peer replicas to ping we returned early without setting the string to
empty. This would cause replicas that had peers (which were failing) and
then the peers left to permanently show an error until a new peer
appeared.
Also demotes DERP replica checking to a "warning" rather than an "error"
which should prevent the primary from removing the proxy from the region
map if DERP meshing is non-functional. This can happen without causing
problems if the peer is shutting down so we don't want to disrupt
everything if there isn't an issue.
(cherry picked from commit cf50461)
//nolint:nilnil // we don't actually use the return value of the
518
-
// singleflight here
519
-
returnnil, nil
516
+
returnfmt.Sprintf("Failed to dial peers: %s", strings.Join(replicaErrs, ", ")), nil
520
517
})
518
+
519
+
//nolint:forcetypeassert
520
+
returnerrStrInterface.(string)
521
521
}
522
522
523
523
func (s*Server) handleRegisterFailure(errerror) {
@@ -590,7 +590,8 @@ func (s *Server) healthReport(rw http.ResponseWriter, r *http.Request) {
590
590
591
591
s.replicaErrMut.Lock()
592
592
ifs.replicaErr!="" {
593
-
report.Errors=append(report.Errors, "High availability networking: it appears you are running more than one replica of the proxy, but the replicas are unable to establish a mesh for networking: "+s.replicaErr)
593
+
report.Warnings=append(report.Warnings,
594
+
"High availability networking: it appears you are running more than one replica of the proxy, but the replicas are unable to establish a mesh for networking: "+s.replicaErr)
0 commit comments