Skip to content

Indefinitely waiting queriesΒ #1089

@egg-juxt

Description

@egg-juxt

Hi. We are experiencing an issue in a cloud deployment, where an app using the PostgresJS library is occasionally executing insert statements that block indefinitely, i.e. the await sql... never returns, i.e. the Promise is never settled, and therefore the program hangs.

We suspect that the connection has been dropped without the client app being notified (a so called half-open connection). The client doesn't receive any responses from the server.

I have been able to reproduce the problem locally, by executing a tc command that starts dropping all packets going through my loopback network interface.

The scenario is as follows:

  1. I start a standard docker PostgreSQL server running on local port 5432
  2. I start my NodeJS program, see code below. It starts inserting records.
  3. I run sudo tc qdisc add dev lo root netem loss 100% to start dropping packets, provoking the half-open connection.
  4. The currently executing insert statement awaits indefinitely.
import postgres from 'postgres';
import * as timers from 'node:timers/promises';
import net from 'net';

async function main() {
  const sql = postgres({
    host: 'localhost',
    port: 5432,
    database: 'postgres',
    username: 'postgres',
    password: 'postgres',
    connect_timeout: 10,
    idle_timeout: 10,
    keep_alive: 10,
  });

  for (let i = 0; i < 1000; i++) {
    console.log(`inserting ${i}...`);
    try {
      await sql`insert into articles (title, author, content) values ('title', 'author', 'content')`;
    } catch (err) {
      console.error('failed insert', i);
    }
    await timers.setTimeout(1000);
  }

}

main().then(() => console.log('finished')).catch(console.error);

Solution found:
I have found a way to detect and destroy a half-open connection by using the socket option of the library and setting a lower-level socket timeout:

  const sql = postgres({
   ...
    socket: () => {
      const s = new net.Socket();
      s.connect(5432, 'localhost');
      s.setTimeout(12000, () => {
        console.warn("destroying idle connection");
        s.resetAndDestroy();
      })
      return s;
    },
    ...
  }

I'm wondering why the keep_alive option of PostgresJS is not having an effect for detecting such half-open connections.

Any thought/suggestion?

Thanks in advance!

(Note: we are actually using XTDB as the database, through its postgres-compatible interface, but I have run the tests above with a standard PostgreSQL).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions