Skip to content

feat: collect database metrics #17635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

feat: collect database metrics #17635

wants to merge 4 commits into from

Conversation

dannykopping
Copy link
Contributor

Currently we don't have a way to get insight into Postgres connections being exhausted.

By using the prometheus' DBStats collector, we get some insight out-of-the-box.

# HELP go_sql_idle_connections The number of idle connections.
# TYPE go_sql_idle_connections gauge
go_sql_idle_connections{db_name="coder"} 1
# HELP go_sql_in_use_connections The number of connections currently in use.
# TYPE go_sql_in_use_connections gauge
go_sql_in_use_connections{db_name="coder"} 2
# HELP go_sql_max_idle_closed_total The total number of connections closed due to SetMaxIdleConns.
# TYPE go_sql_max_idle_closed_total counter
go_sql_max_idle_closed_total{db_name="coder"} 112
# HELP go_sql_max_idle_time_closed_total The total number of connections closed due to SetConnMaxIdleTime.
# TYPE go_sql_max_idle_time_closed_total counter
go_sql_max_idle_time_closed_total{db_name="coder"} 0
# HELP go_sql_max_lifetime_closed_total The total number of connections closed due to SetConnMaxLifetime.
# TYPE go_sql_max_lifetime_closed_total counter
go_sql_max_lifetime_closed_total{db_name="coder"} 0
# HELP go_sql_max_open_connections Maximum number of open connections to the database.
# TYPE go_sql_max_open_connections gauge
go_sql_max_open_connections{db_name="coder"} 10
# HELP go_sql_open_connections The number of established connections both in use and idle.
# TYPE go_sql_open_connections gauge
go_sql_open_connections{db_name="coder"} 3
# HELP go_sql_wait_count_total The total number of connections waited for.
# TYPE go_sql_wait_count_total counter
go_sql_wait_count_total{db_name="coder"} 28
# HELP go_sql_wait_duration_seconds_total The total time blocked waiting for a new connection.
# TYPE go_sql_wait_duration_seconds_total counter
go_sql_wait_duration_seconds_total{db_name="coder"} 0.086936235

go_sql_wait_count_total is the metric I'm most interested in gaining, but the others are also very useful.

Changing the prefix is easy (prometheus.WrapRegistererWithPrefix), but getting rid of the go_ segment is not quite so easy. I've kept the changeset small for now.

NOTE: I imported a library to determine the database name from the given conn string. It's not as simple as one might hope. The database name is used for the db_name label.

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
@dannykopping dannykopping changed the title feat: use go's sqldatabase metrics feat: collect database metrics May 1, 2025
Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be done instead in enablePrometheus().
I don't see a point registering the collector if prometheus is not enabled for the deployment.

go.mod Outdated
@@ -489,6 +489,7 @@ require (
require (
github.com/coder/preview v0.0.1
github.com/fsnotify/fsnotify v1.9.0
github.com/jackc/pgx/v5 v5.7.4
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New dependencies aren't too huge, but still insane for just parsing a DSN. I didn't find any other lib as rigorous as this one; open to suggestions.

$ gsa /home/coder/coder/build/coder_2.21.3-devel+9645ed314_linux_amd64 2>/dev/null | grep jackc
│ 0.07%   │ github.com/jackc/pgx/v5                                                             │ 202 kB │ vendor    │
│ 0.00%   │ github.com/jackc/pgservicefile                                                      │ 3.0 kB │ vendor    │
│ 0.00%   │ github.com/jackc/pgpassfile                                                         │ 2.8 kB │ vendor    │

(source)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much should we care about getting an accurate db name? Some? Just hardcode "coder" and call it a day?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially thinking that, but I suspect seeing db_name=coder as a label when they're configured it as coder_com_pg or whatever would cause unnecessary confusion.

It's also possible that multiple coder installations (with separate DBs) exist in the same infra.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible that multiple coder installations (with separate DBs) exist in the same infra.

True enough, but the dbname is not enforced to be unique across the infra, it's local to a particular DB server instance. So you can already have multiple Coder deployments with separate DBs with the same label.

I suspect seeing db_name=coder as a label when they're configured it as coder_com_pg or whatever would cause unnecessary confusion.

What happens if we set it to ""?

Copy link
Contributor Author

@dannykopping dannykopping May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go_sql_idle_connections{db_name=""} 3

Prometheus will drop the empty label.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't hate that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, me neither. Cool, lemme refactor 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is assuming whatever scrapes the metrics augments the metrics to add pod/namespace/container labels. Without that, it might be hard to triangulate where these metrics are coming from in a complex environment without some reference to coder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True for every one of our generic go_ metrics, unfortunately.

I've never heard of a scraping system that doesn't augment with additional labels...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's see how it's received. Easy to change later.
Made the change.

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
@@ -739,6 +739,10 @@ func (r *RootCmd) Server(newAPI func(context.Context, *coderd.Options) (*coderd.
_ = sqlDB.Close()
}()

if options.DeploymentValues.Prometheus.Enable {
options.PrometheusRegistry.MustRegister(collectors.NewDBStatsCollector(sqlDB, ""))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest adding a comment with our rationale for setting dbName to an empty string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants