-
Notifications
You must be signed in to change notification settings - Fork 887
fix: Enable reconnectingpty loadtest and fix/improve logging #5403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0393aa2
to
fda0bb7
Compare
agent/agent.go
Outdated
prevBuf := slices.Clone(rpty.circularBuffer.Bytes()) | ||
rpty.circularBufferMutex.RUnlock() | ||
_, err = conn.Write(prevBuf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review: Unrelated fix(?). It seemed like a bad idea (to me) to perform a write on a network connection whilst holding the mutex, this shortens the span.
However, neither now, nor before, do we achieve perfect synchronization. In theory we should ensure that writes happening between writing prevBuf
to conn and adding conn to activeConns
are not lost. I did not try to fix this issue in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. It might be worth leaving that in a comment.
agent/agent.go
Outdated
} | ||
_, err = rpty.ptty.Input().Write([]byte(req.Data)) | ||
if err != nil { | ||
a.logger.Warn(ctx, "write to reconnecting pty", slog.F("id", msg.ID), slog.Error(err)) | ||
return | ||
logger.Warn(ctx, "write to pty", slog.Error(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review: It's unclear to my why some logs have been Warn
whereas others are Error
, I did however notice some tests fail unless I mimic the previous behavior exactly, which is what I've attempted here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's unfortunate D: I believe they were best effort
@@ -604,6 +615,7 @@ func AwaitWorkspaceAgents(t *testing.T, client *codersdk.Client, workspaceID uui | |||
|
|||
return true | |||
}, testutil.WaitLong, testutil.IntervalFast) | |||
t.Logf("got workspace agents (workspace %s)", workspaceID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review: These log messages help discern the order of events.
defer conn.Close() | ||
|
||
connectionID := uuid.NewString() | ||
logger = logger.With(slog.F("id", msg.ID), slog.F("connection_id", connectionID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review: Addition of connection_id
will help debug if logs are coming from one or multiple connections.
@@ -300,10 +301,12 @@ func (a *agent) createTailnet(ctx context.Context, derpMap *tailcfg.DERPMap) (_ | |||
} | |||
}() | |||
if err = a.trackConnGoroutine(func() { | |||
logger := a.logger.Named("reconnecting-pty") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review: Named logging seemed like it would be helpful here, even though it's not fully a separate service.
server.init(ctx) | ||
return server | ||
a.init(ctx) | ||
return a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review: This just bothered me, lol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
;p I agree
This commit re-enabled reconnectingpty loadtests after a logging refactor of `(*agent).handleReconnectingPTY`. The reasons the tests were flaking was that `logger.Error` was being called and `slogtest` failing the test. We could have set the option for `slogtest` to disable failing, but that could hide real issues. The current approach improves reconnectingpty logging overall and provides more insight into what's happening. It's expected that reconnectingpty sessions fail after the agent is closed, so calling `logger.Error` at that point is not wanted. Ref: #5322
fda0bb7
to
772dff1
Compare
While working on #5403 I noticed the web terminal connects with a different UUID than what is serialized in the URL, this resulted in a new session being opened if I refreshed the page. The reasons seems to be that we were accessing raw DOM elements (`location.search`) for navigation and omitting `react-router-dom`. This resulted in a new navigation and UUID being generated after the fact.
server.init(ctx) | ||
return server | ||
a.init(ctx) | ||
return a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
;p I agree
agent/agent.go
Outdated
prevBuf := slices.Clone(rpty.circularBuffer.Bytes()) | ||
rpty.circularBufferMutex.RUnlock() | ||
_, err = conn.Write(prevBuf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. It might be worth leaving that in a comment.
3051881
to
41e119c
Compare
This commit re-enabled reconnectingpty loadtests after a logging
refactor of
(*agent).handleReconnectingPTY
. The reasons the tests wereflaking was that
logger.Error
was being called andslogtest
failingthe test.
We could have set the option for
slogtest
to disable failing, but thatcould hide real issues. The current approach improves reconnectingpty
logging overall and provides more insight into what's happening. It's
expected that reconnectingpty sessions fail after the agent is closed,
so calling
logger.Error
at that point is not wanted.Ref: #5322