Skip to content

feat(agent): add connection reporting for SSH and reconnecing PTY #16652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 27, 2025

Conversation

mafredri
Copy link
Member

@mafredri mafredri commented Feb 21, 2025

Updates #15139

To allow merging, I've placed the reporting behind an experimental flag on the agent (--experimental-connection-reports-enable).

See #15139 for how these are represented in the UI (and plans for updating the UI).

This comment was marked as resolved.

@mafredri mafredri force-pushed the mafredri/feat-add-agent-connection-reporting branch from 1b8c15c to cd31d7e Compare February 21, 2025 13:28
@mafredri mafredri force-pushed the mafredri/feat-add-agent-connection-reporting branch from cd31d7e to 58463c6 Compare February 21, 2025 14:07
@mafredri mafredri marked this pull request as ready for review February 24, 2025 13:18
@mafredri mafredri requested review from johnstcn, spikecurtis and Emyrk and removed request for johnstcn February 24, 2025 13:20
@mafredri mafredri force-pushed the mafredri/feat-add-agent-connection-reporting branch from 4f934c8 to a77ceac Compare February 24, 2025 13:25
a.reportConnectionsMu.Lock()
if len(a.reportConnections) == 0 {
a.reportConnectionsMu.Unlock()
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a label here for clarity? break will always break the innermost loop but I always have trouble remembering that personally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you review in VS Code, the syntax highlighting can be helpful here!
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only use labels when I need to. I don't think we have examples in our code for labels breaking the inner loop. Do we?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You got me 😄 ❤️

}

func (s *sessionCloseTracker) Close() error {
s.track(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does 1 mean in this instance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as a shell script exiting with generic error (1). Calling Close instead of Exit on the session indicates an "error" state (not error as in log, but error as in session didn't go right).

Comment on lines +491 to +498
{
Flag: "experimental-connection-reports-enable",
Hidden: true,
Default: "false",
Env: "CODER_AGENT_EXPERIMENTAL_CONNECTION_REPORTS_ENABLE",
Description: "Enable experimental connection reports.",
Value: serpent.BoolOf(&experimentalConnectionReports),
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be something that gets set as a deployment-wide experiment and is automatically set in the agent manifest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this is an (temporary) experiment is simply to not mess up the audit log UI until it is updated. I'm not aware that this needs to be placed behind an experiment deployment wide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an alternative suggestion!

a.reportConnectionsMu.Lock()
if len(a.reportConnections) == 0 {
a.reportConnectionsMu.Unlock()
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only use labels when I need to. I don't think we have examples in our code for labels breaking the inner loop. Do we?


// Remove the payload we sent.
a.reportConnectionsMu.Lock()
a.reportConnections = a.reportConnections[1:]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just make reportConnections channel? If it's an append only slice that is only read in this function.

This slice behavior is correct, just feels like a weaker implementation of a channel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it could be a channel, but how big to make it? How many in-flight reports is "too many"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, a channel would not be a terrible option here. It requires upfront allocation which can either be a good or a bad thing in memory constrained systems. For now I've limited this to 2048 reports pending, or about 300KB. We can revisit this later if needed.

agent/agent.go Outdated
Comment on lines 793 to 803
a.reportConnections = append(a.reportConnections, &proto.ReportConnectionRequest{
Connection: &proto.Connection{
Id: id[:],
Action: proto.Connection_CONNECT,
Type: connectionType,
Timestamp: timestamppb.New(time.Now()),
Ip: ip,
StatusCode: 0,
Reason: nil,
},
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to worry about the size of this slice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically no. The only way these accumulate is if we have lost connection to coders. I could add some safe-guards around this to drop messages, but since it's part of auditing that feels a bit wrong. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to your judegment. If you feel it is relatively bounded, then it's good with me. Maybe leave a comment of the assumptions?

I did not spend that long understanding the full context of the code.

Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock. I'm not particularly pushed on channel versus slice since this is behind an experimental flag and not enabled by default.

@mafredri mafredri force-pushed the mafredri/feat-add-agent-connection-reporting branch from c77b24e to 0318212 Compare February 27, 2025 10:20
@mafredri mafredri enabled auto-merge (squash) February 27, 2025 10:37
@mafredri mafredri merged commit 4ba5a8a into main Feb 27, 2025
30 checks passed
@mafredri mafredri deleted the mafredri/feat-add-agent-connection-reporting branch February 27, 2025 10:45
@github-actions github-actions bot locked and limited conversation to collaborators Feb 27, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants