-
Notifications
You must be signed in to change notification settings - Fork 889
feat: add metrics to PGPubsub #11971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @spikecurtis and the rest of your teammates on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file isn't new, it's just renamed from pubsub_test.go since it was Linux only and the metrics tests runs on any platform.
bbf6e1b
to
899ec80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few questions/suggestions.
p.logger.Debug(p.ctx, "publish", slog.F("event", event), slog.F("message_len", len(message))) | ||
// This is safe because we are calling pq.QuoteLiteral. pg_notify doesn't | ||
// support the first parameter being a prepared statement. | ||
//nolint:gosec | ||
_, err := p.db.ExecContext(p.ctx, `select pg_notify(`+pq.QuoteLiteral(event)+`, $1)`, message) | ||
if err != nil { | ||
p.publishesTotal.WithLabelValues("false").Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does false label represent error here? Is that common prom metric practice? (Just checking since I'm not that experienced adding prom metrics.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The label name is success
with values true
or false
. Just trying to keep our labeling consistent within the product and that's how we do it at
coder/agent/agentscripts/agentscripts.go
Line 66 in 3ace798
}, []string{"success"}), |
p.mut.Lock() | ||
defer p.mut.Unlock() | ||
defer func() { | ||
if err != nil { | ||
// if we hit an error, we need to close the queue so we don't | ||
// leak its goroutine. | ||
newQ.close() | ||
p.subscribesTotal.WithLabelValues("false").Inc() | ||
} else { | ||
p.subscribesTotal.WithLabelValues("true").Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about including the event as well? Might be useful to see if there's an abnormal amount of subs/pubs/data going for a specific event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a bad idea because some events include the UUID of the thing, e.g. workspace, which would make the set of label values unbounded, which is a cardinal sin in metrics.
Not for this PR, but if we decide we really want that information, we could standardize event channel names to be a struct with a static root and then zero or more specific IDs, and then we could label the publishes, subscribes and messages with the static root.
899ec80
to
ff526cc
Compare
Merge activity
|
Adds prometheus metrics to PGPubsub for monitoring its health and performance in production.
Related to #11950 --- additional diagnostics to help figure out what's happening