-
Notifications
You must be signed in to change notification settings - Fork 894
feat: Implement aggregator for agent metrics #7259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
6516216
API contract
mtojek dc202c4
Send agent metrics
mtojek 7747f2d
Ignore metrics to save bandwidth
mtojek 9fd4ddb
fix lint
mtojek 9af0246
logEntry
mtojek 4207dff
make gen
mtojek 99fe1bf
Use errGroup
mtojek df80e9b
Use MustNewConstMetric
mtojek d86496e
PoC works
mtojek 10e6d8d
Metrics aggregator with channels
mtojek 8df9eea
Metrics expiry
mtojek 1f5273b
histograms
mtojek 1b8c486
unit test
mtojek 423420b
fmt
mtojek 23bbe94
test: metrics can expire
mtojek b7011ae
Aggregator
mtojek 29a8702
Address PR comments
mtojek 7acd113
wrap errors
mtojek b15c7b7
fix
mtojek 2ae7e4e
Update coderd/prometheusmetrics/aggregator.go
mtojek b04d232
refactor: PTY & SSH (#7100)
spikecurtis 1d93f66
feat(community-templates): Added vscode-server-template (#7219)
nanospearing c604633
chore: Proxy health status checks + endpoint (#7233)
Emyrk 7d84745
Revert "feat(UI): add workspace restart button (#7137)" (#7268)
Kira-Pilot 407c332
refactor(site): Group app and agent actions together (#7267)
BrunoQuaresma 49b81df
fix(coderd): ensure that user API keys are deleted when a user is (#7…
johnstcn 44217de
chore(dogfood): remove unnecessary docker host replace (#7269)
coadler e659c36
Fix macOS pty race with dropped output (#7278)
spikecurtis 6dc8b1f
feat: add regions endpoint for proxies feature (#7277)
deansheather d2233be
fix(healthcheck): don't allow panics to exit coderd (#7276)
coadler f3f5bed
chore: add security advisories to docs (#7282)
johnstcn 50f60cb
fix(site): Do not show template params if there is no param to be dis…
BrunoQuaresma 1bf1b06
fix(site): Fix default value for options (#7265)
BrunoQuaresma 5f6b4dc
chore: fix flake in apptest reconnecting-pty test (#7281)
deansheather 9141f7c
Reconnecting PTY waits for command output or EOF (#7279)
spikecurtis e0879b5
docs(site): Mention template editor in template edit docs (#7261)
BrunoQuaresma b6322d1
fix(site): Fix secondary buttons with popovers (#7296)
BrunoQuaresma 1e3eb06
chore: change some wording in the dashboard (#7293)
bpmct 366859b
feat(agent): add http debug routes for magicsock (#7287)
coadler ed8106d
feat: add license expiration warning (#7264)
rodrimaia 5733abc
feat: add license settings UI (#7210)
rodrimaia 4937e75
chore: add envbox documentation (#7198)
sreya 619e470
docs: Fix relay link in HA doc (#7159)
winter0mute 16b5353
Merge branch 'main' into 6724-api-collect-metrics
mtojek c1bd4d2
Refactor Collect channel
mtojek 8baed98
fix
mtojek File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Metrics expiry
- Loading branch information
commit 8df9eeac51cff84e37dd7691cfea2a2ae2cf9808
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this will be used, but I worry about the use of channels. Essentially the way this behaves now is that it's enough for one misbehaving consumer to lock up this entire goroutine since it's dependent on the collected metrics being consumed in a timely manner.
So essentially we can insert 128 entries here immediately, but if more is queued up then it's on the one who called
Collect(ch)
to ensure things progress smoothly. This could prevent update and cleanup.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
128 entries correspond to 128 agents submitting their metrics at the same time. I'm aware of the issue, but considering our scale, I don't think that we should encounter this problem soon. Speaking of a potential solution, do you think that we should introduce a "middle channel" to timeout the
Collect(ch)
if metrics are not streamed? I'm afraid that it may complicate the existing logic.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I see. I think another option, instead of an intermediate channel would be to send a slice over this channel instead of 128 items over the channel. Then Collect can process the slice instead of consuming the channel and the goroutine can immediately continue its work. This way the logic doesn't need to change much at all. This ofc leads to increased memory usage when we exceed 128, but then again, if we do exceed it now it might not be ideal either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case
Collect
will need to know when the slice is ready to be "ranged over", otherwise it would be a data race.I slightly modified your idea. I kept the channel, but I'm passing the entire slice at once, so 128 items limit does not exist anymore. Please let me know your thoughts, @mafredri.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, and it actually looks like how I intended it 😄