Skip to content

fix: move pubsub publishing out of database transactions to avoid conn exhaustion #17648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

dannykopping
Copy link
Contributor

@dannykopping dannykopping commented May 1, 2025

Database transactions hold onto connections, and pubsub.Publish tries to acquire a connection of its own. If the latter is called within a transaction, this can lead to connection exhaustion.

I plan two follow-ups to this PR:

  1. Make connection counts tuneable

https://github.com/coder/coder/blob/main/cli/server.go#L2360-L2376

We will then be able to write tests showing how connection exhaustion occurs.

  1. Write a linter/ruleguard to prevent pubsub.Publish from being called within a transaction.

…n starvation

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
@dannykopping dannykopping force-pushed the dk/notify-outside-tx branch from 6efe39f to ba2f90a Compare May 1, 2025 18:49
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
@dannykopping dannykopping marked this pull request as ready for review May 2, 2025 09:56
clock: clock,
registerer: registerer,
done: make(chan struct{}, 1),
provisionNotifyCh: make(chan database.ProvisionerJob, 10),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thumb-suck; I want to protect against temporary network/database blips but also don't want to accumulate too many messages.

…ot affect behaviour

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
func (*brokenPublisher) Publish(event string, _ []byte) error {
// I'm explicitly _not_ checking for EventJobPosted (coderd/database/provisionerjobs/provisionerjobs.go) since that
// requires too much knowledge of the underlying implementation.
return xerrors.Errorf("refusing to publish %q", event)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we fail slowly instead?

e.g. <-time.After(testutil.IntervalFast)

@evgeniy-scherbina
Copy link
Contributor

evgeniy-scherbina commented May 2, 2025

@dannykopping is it possible to use another approach, and allow calling Publish from within existing transaction?
For ex.: add smth like this:

func (p *PGPubsub) PublishWithinTx(event string, message []byte, tx database.Store) error {
   tx.ExecContext(context.Background(), `select pg_notify(`+pq.QuoteLiteral(event)+`, $1)`, message)
}

Implementations which doesn't support txs may implement it:

func (p *InMemPubSub) PublishWithinTx(event string, message []byte, tx database.Store) error {
    _ = tx // ignored
   return p.PublishTx(event, message)
}

But it will require some changes in database.Store and PGPubSub


Because we impose restrictions on ourselves, every time we need to use Publish within tx - we have to come up with workaround like in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants