Skip to content

fix: use ephemeral port for postgres, except for TestPubsub_Disconnect #7798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 5, 2023

Conversation

spikecurtis
Copy link
Contributor

Fixes #7752

Our implementation of starting postgres on ephemeral ports is racy: it

  1. opens a TCP listener with port 0, which asks the OS to allocate a free port
  2. stores the port number
  3. starts a Docker container with that port as the host mapping

However, between 2 and 3 another process can come in and snatch the port.

Docker itself accepts 0 as a port mapping, and it asks the OS to allocate a free port. This closes the race.

A second issue is that we were using an ephemeral port for TestPubsub_Disconnect where we kill the postgres container and then restart it on the same port. It suffers from a similar race, where another process could snatch the port while postgres is down. To solve this, we use a hardcoded, arbitrary high port number, but one outside the Linux ephemeral range.

Signed-off-by: Spike Curtis <spike@coder.com>
@spikecurtis spikecurtis requested review from coadler, kylecarbs and a team June 2, 2023 09:49
@spikecurtis spikecurtis changed the title fix: ask OS for postgres ephemeral port; don't use ephemeral port for TestPubsub_Disconnect fix: ask OS for postgres ephemeral port; don't use for TestPubsub_Disconnect Jun 2, 2023
@spikecurtis spikecurtis changed the title fix: ask OS for postgres ephemeral port; don't use for TestPubsub_Disconnect fix: use ephemeral port for postgres except for TestPubsub_Disconnect Jun 2, 2023
@spikecurtis spikecurtis changed the title fix: use ephemeral port for postgres except for TestPubsub_Disconnect fix: use ephemeral port for postgres, except for TestPubsub_Disconnect Jun 2, 2023
// and restart it on the same port. If we use an ephemeral port, there is a chance the OS will reallocate before we
// start back up. The downside is that if the test crashes and leaves the container up, subsequent test runs will fail
// until we manually kill the container.
const disconnectTestPort = 26892
Copy link
Member

@mafredri mafredri Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this, by default, usually falls outside the dynamic port range of :0, so we won't collide with other tests/test packages. Are the rules same on Windows?

At least on my NAS, it seems OK:

❯ cat /proc/sys/net/ipv4/ip_local_port_range
32768	60999

Assuming these values are set differently, there could still be port collisions.

Something like port := testutil.Port(26892), where that function takes the input as preferred, but falls back to seeking the closest free port would help in the case mentioned in the comment. The package can store "used ports", unfortunately I don't think that state will be shared between test packages (could be wrong though).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding Windows, we don't run these tests there. Presumably because Docker on Windows is more effort/overhead than it's worth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using testutil.Port() is clever, but thinking about it a bit, I don't think we should be that clever in tests. If the test leaks the container, I'd rather us know about it by failing tests, versus doing something different and rare.

@spikecurtis spikecurtis merged commit e016c30 into main Jun 5, 2023
@spikecurtis spikecurtis deleted the spike/7752-port-binding-flake branch June 5, 2023 05:24
@github-actions github-actions bot locked and limited conversation to collaborators Jun 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestPubsub_Disconnect flake on port binding
3 participants