-
Notifications
You must be signed in to change notification settings - Fork 926
chore: zero out session stats from agent with experiment enabled #13579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
83db44d
64defe8
1ab57a4
3248673
f90bed2
52d231c
d19a4da
eb09c6f
fefc2de
39b78e9
cf80efe
b2fb6af
43a0035
8b50731
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,6 +10,8 @@ import ( | |
|
||
"cdr.dev/slog" | ||
"github.com/coder/coder/v2/agent/proto" | ||
"github.com/coder/coder/v2/codersdk" | ||
"github.com/coder/coder/v2/codersdk/agentsdk" | ||
) | ||
|
||
const maxConns = 2048 | ||
|
@@ -22,23 +24,25 @@ type statsCollector interface { | |
Collect(ctx context.Context, networkStats map[netlogtype.Connection]netlogtype.Counts) *proto.Stats | ||
} | ||
|
||
type statsDest interface { | ||
type statsAPI interface { | ||
GetExperiments(ctx context.Context, req *proto.GetExperimentsRequest) (*proto.GetExperimentsResponse, error) | ||
UpdateStats(ctx context.Context, req *proto.UpdateStatsRequest) (*proto.UpdateStatsResponse, error) | ||
} | ||
|
||
// statsReporter is a subcomponent of the agent that handles registering the stats callback on the | ||
// networkStatsSource (tailnet.Conn in prod), handling the callback, calling back to the | ||
// statsCollector (agent in prod) to collect additional stats, then sending the update to the | ||
// statsDest (agent API in prod) | ||
// statsAPI (agent API in prod) | ||
type statsReporter struct { | ||
*sync.Cond | ||
networkStats *map[netlogtype.Connection]netlogtype.Counts | ||
unreported bool | ||
lastInterval time.Duration | ||
|
||
source networkStatsSource | ||
collector statsCollector | ||
logger slog.Logger | ||
source networkStatsSource | ||
collector statsCollector | ||
logger slog.Logger | ||
experiments codersdk.Experiments | ||
} | ||
|
||
func newStatsReporter(logger slog.Logger, source networkStatsSource, collector statsCollector) *statsReporter { | ||
|
@@ -66,7 +70,15 @@ func (s *statsReporter) callback(_, _ time.Time, virtual, _ map[netlogtype.Conne | |
// connection to the agent API, then passes that connection to go routines like | ||
// this that use it. There is no retry and we fail on the first error since | ||
// this will be inside a larger retry loop. | ||
func (s *statsReporter) reportLoop(ctx context.Context, dest statsDest) error { | ||
func (s *statsReporter) reportLoop(ctx context.Context, dest statsAPI) error { | ||
exp, err := dest.GetExperiments(ctx, &proto.GetExperimentsRequest{}) | ||
if err != nil { | ||
return xerrors.Errorf("get experiments: %w", err) | ||
} | ||
s.L.Lock() | ||
s.experiments = agentsdk.ExperimentsFromProto(exp) | ||
f0ssel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
s.L.Unlock() | ||
|
||
// send an initial, blank report to get the interval | ||
resp, err := dest.UpdateStats(ctx, &proto.UpdateStatsRequest{}) | ||
if err != nil { | ||
|
@@ -105,13 +117,24 @@ func (s *statsReporter) reportLoop(ctx context.Context, dest statsDest) error { | |
} | ||
|
||
func (s *statsReporter) reportLocked( | ||
ctx context.Context, dest statsDest, networkStats map[netlogtype.Connection]netlogtype.Counts, | ||
ctx context.Context, dest statsAPI, networkStats map[netlogtype.Connection]netlogtype.Counts, | ||
) error { | ||
// here we want to do our collecting/reporting while it is unlocked, but then relock | ||
// when we return to reportLoop. | ||
s.L.Unlock() | ||
defer s.L.Lock() | ||
stats := s.collector.Collect(ctx, networkStats) | ||
|
||
// if the experiment is enabled we zero out certain session stats | ||
// as we migrate to the client reporting these stats instead. | ||
Comment on lines
+128
to
+129
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can this get implemented serverside instead? It seems like a lot of changes in the agent for something that could be zeroed out in the stats API on the server. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1, this would keep the change in 'one place' and remove the need for plumbing this through AgentAPI There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not really and here's why - If customers update coder but do not rebuild the workspace, we will have an old agent client and old cli version in the workspace. If the experiment is enabled and we zero out the value server side, we will lose data for that workspace because the old cli will not be reporting the new data via the usage endpoint. So the graphs will go to zero and slowly come back up over time as each workspace is restarted. So we need the endpoint to still save this data for old workspaces, but also allow new workspaces to report it in the new way. Or we accept downtime on the stats on upgrade until all workspaces are updated. Thoughts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Won't the same thing happen even if the feature was GA? Old agent/CLI versions need to be supported on new servers for multiple months. This will also affect stats collection for older CLIs being used outside of workspaces, which arguably is where more people use the CLI anyways. I don't think anyone is using the CLI in a workspace to connect to a workspace.
I think either way there will be a "downtime" of stats with or without the agent changes when an older CLI is being used on the local machine. I think this PR should change into just the CLI stats upload portion and have both sides report stats for now. Then in a few months you can remove stats reporting from the API by deprecating the field and ignoring it serverside. |
||
if s.experiments.Enabled(codersdk.ExperimentWorkspaceUsage) { | ||
stats.SessionCountSsh = 0 | ||
// TODO: More session types will be enabled as we migrate over. | ||
// stats.SessionCountVscode = 0 | ||
// stats.SessionCountJetbrains = 0 | ||
// stats.SessionCountReconnectingPty = 0 | ||
} | ||
|
||
resp, err := dest.UpdateStats(ctx, &proto.UpdateStatsRequest{Stats: stats}) | ||
if err != nil { | ||
return err | ||
|
Uh oh!
There was an error while loading. Please reload this page.