connection: fix O(n^2) reply parsing. #207

dlbeer · 2021-08-03T01:50:07Z

The data method in the Connection object handles incoming data chunks by
concatenating them with a Buffer object. If the data chunks are each of
bounded size, then this algorithm is O(n^2) in the total size of the
reply.

This leads to excessive CPU consumption and hangs when the replies are
large -- reading a 40 MB BYTEA causes a very long pause when tested
here.

This commit introduces an efficient double-ended queue, with O(n) push
and shift operations, and uses that instead of allocating a new
concatenated Buffer for each chunk.

The data method in the Connection object handles incoming data chunks by concatenating them with a Buffer object. If the data chunks are each of bounded size, then this algorithm is O(n^2) in the total size of the reply. This leads to excessive CPU consumption and hangs when the replies are large -- reading a 40 MB BYTEA causes a very long pause when tested here. This commit introduces an efficient double-ended queue, with O(n) push and shift operations, and uses that instead of allocating a new concatenated Buffer for each chunk.

porsager · 2021-08-03T11:27:35Z

Hi @dlbeer

I tried your fork with https://github.com/porsager/postgres-benchmarks and found it to be 50-100% slower than current master..

There are no big "bytea" responses in that test, but maybe you could add a benchmark for that too so we can see if it's faster in practice. Even so, I wouldn't want to sacrifice performance for the 99% of use cases to improve 40MB bytea responses.

dlbeer · 2021-08-03T22:21:54Z

On Tue, Aug 03, 2021 at 04:27:45AM -0700, Rasmus Porsager wrote: I tried your fork with https://github.com/porsager/postgres-benchmarks and found it to be 50-100% slower than current master.. There are no big "bytea" responses in that test, but maybe you could add a benchmark for that too so we can see if it's faster in practice. Even so, I wouldn't want to sacrifice performance for the 99% of use cases to improve 40MB bytea responses.

Not sure if it was clear from my original description, but this isn't specific to BYTEA, and it's not merely a special-case performance improvement (that's just how I came across it). If a malicious user is able to control the size of a response from the database server, they can lock up the process for hours at a time. I probably don't have time at the moment to optimize the ByteQueue implementation myself, but I'd be happy to answer questions or check over improvements that anyone else has to contribute.

…

-- Daniel Beer ***@***.***> http://dlbeer.co.nz/ PGP: BA6E 0B26 1F89 246C E3F3 C910 1E58 C43A 160A 553B

porsager · 2021-08-04T07:13:36Z

Alright.

Would be very nice if you could write a reproductive case for when this lock happens.

dlbeer · 2021-08-04T07:34:25Z

On Wed, Aug 04, 2021 at 12:13:46AM -0700, Rasmus Porsager wrote: Alright. Would be very nice if you could write your case for when this lock happens.

Do you mean add something to the benchmark suite, or just describe it? The simplest way to trigger it is to create a table with a BYTEA or TEXT column, store a large value (tens of MB, perhaps?), then SELECT it.

…

-- Daniel Beer ***@***.***> http://dlbeer.co.nz/ PGP: BA6E 0B26 1F89 246C E3F3 C910 1E58 C43A 160A 553B

porsager · 2021-08-04T07:40:25Z

No, just describe it like that 😉 Now I can take a better look at it - thanks

Minigugus · 2021-08-04T22:06:26Z

Not sure if it can help but maybe we could pre-allocate the array passed to backend:

let next_data = null

function data(x) {
  if (next_data) {
    const new_length = Math.min(next_data.byteLength + x.byteLength, next_data.buffer.byteLength)
    const to_read_from_x = new_length - next_data.byteLength
    next_data = new Uint8Array(
      next_data.buffer, 
      next_data.byteOffset,
      new_length
    );
    next_data.set(x.subarray(0, to_read_from_x), new_length - to_read_from_x)
    if (new_length < next_data.buffer.byteLength)
      return
    // next_data already have the `length + 1` length
    backend[next_data[0]](Buffer.from(next_data.buffer, next_data.byteOffset, next_data.buffer.byteLength)) 
    next_data = null
    x = x.slice(x.byteLength - to_read_from_x)
  }

  buffer = buffer.length === 0
    ? x
    : Buffer.concat([buffer, x], buffer.length + x.length)

  while (buffer.length > 4) {
    length = buffer.readInt32BE(1)
    if (length >= buffer.length) {
      if (length > 65535) { // heuristic (should be based on benchmark results)
        next_data = Buffer.allocUnsafe(length + 1)
        next_data = new Uint8Array(
          next_data.buffer, 
          next_data.byteOffset,
          buffer.byteLength
        );
        next_data.set(buffer, 0)
        buffer = Buffer.alloc(0)
      }
      break
    }

    backend[buffer[0]](buffer.slice(0, length + 1))
    buffer = buffer.slice(length + 1)
  }
}

(not tested)

This is kind of a mix of previous solutions (avoid many .concat and avoid keeping references to small arrays)

dlbeer · 2021-08-04T22:57:01Z

On Wed, Aug 04, 2021 at 03:06:36PM -0700, Minigugus wrote: buffer = buffer.length === 0 ? x : Buffer.concat([buffer, x], buffer.length + x.length)

This is the part that causes the problem. What happens is that incoming data arrives in chunks of no more than some fixed size (k). If you need to assemble a sequence of N chunks before you have a complete reply, then you have a sequence of N calls to Buffer.concat, and in each case you're concatenating all the data you've received so far with k more bytes. You therefore end up copying overall: k*N*(N+1)/2 = k/2*N^2 + k/2*N + k/2 bytes The first term (k/2*N^2) is what causes the problem. For every doubling of the message size, you end up doing (roughly) four times as much work.

…

-- Daniel Beer ***@***.***> http://dlbeer.co.nz/ PGP: BA6E 0B26 1F89 246C E3F3 C910 1E58 C43A 160A 553B

Minigugus · 2021-08-04T23:16:40Z

If you need to assemble a sequence of N chunks before you have a
complete reply, then you have a sequence of N calls to Buffer.concat [...]

@dlbeer You're right, but I think most the time, x is bigger than 1 packet (therefore N is near 1), and that should be the reason why your version is still slower according to the benchmarks. I believe that because the data(x) function is not just called for row decoding, but also for every other packets coming from the server (authentication, error responses, and so on).

Moreover, instead of buffer = buffer.slice(length + 1), your code calls buffer.shift(length + 1), which allocate a new buffer every time, whereas .slice() reuse the same buffer, only offsets are updated.

I haven't tested your code so there is a lot of assumptions in this comment, I might be wrong on some points. My hypothesis is that the current implementation is better for small packets, but yours is better for big packets. It's why I suggested the code in my previous comment, so that we can let the benchmarks be our judge.

dlbeer · 2021-08-04T23:33:09Z

On Wed, Aug 04, 2021 at 04:16:50PM -0700, Minigugus wrote: Moreover, instead of `buffer = buffer.slice(length + 1)`, your code calls `buffer.shift(length + 1)`, which [allocate a new buffer every time](https://github.com/porsager/postgres/blob/04c93b72f68b4193d29e1bdbeef940feee1fe0ed/lib/byte_queue.js#L94), whereas `.slice()` [reuse the same buffer, only offsets are updated](https://nodejs.org/api/buffer.html#buffer_buf_slice_start_end).

Yes, that's correct. A small and simple optimization (which I haven't benchmarked) might be to keep pre-allocated a 5-byte header buffer and use shiftTo instead. This would avoid buffer allocation on each loop iteration. Another possibility is to only parse the header once when the buffer first grows above 5 bytes, and keep cached the length and type fields for when the full message arrives.

…

-- Daniel Beer ***@***.***> http://dlbeer.co.nz/ PGP: BA6E 0B26 1F89 246C E3F3 C910 1E58 C43A 160A 553B

stalniy · 2021-09-09T15:48:50Z

I used 187Mb json file from here, and tested load time and select time:

psql:

insert - Time: 6777,251 ms (00:06,777)
select - Time: 1500,490 ms (00:01,500)

Example

test-example=# \set content `cat ../sf-city-lots-json/citylots.json`
test-example=# insert into "data" (id, "data") values (1, :'content');
INSERT 0 1
Time: 6777,251 ms (00:06,777)

test-example=# select data from "data" where id =1;
Time: 1500,490 ms (00:01,500)

js

insert - Time - 6661ms; 6.661s
select - Time - 116242ms; 116.2419s; 1.937 mins

Example

import postgres from "postgres";
import fs from "fs";

const sql = postgres(process.env.DB_URL);

async function select() {
  const s = performance.now();
  await sql`SELECT data from "data" LIMIT 1`;
  print(performance.now() - s);
}

async function insert() {
  const content = fs.readFileSync(`${process.cwd()}/../sf-city-lots-json/citylots.json`, 'utf-8');
  const s = performance.now();
  await sql`INSERT INTO "data" (id, "data") VALUES (5, ${content})`;
  return print(performance.now() - s);
}

function print(ms) {
  console.log(`Time - ${ms}ms; ${ms / 1000}s; ${ms / 60000}`);
}

async function main() {
  await insert();
  await select();
}

main().finally(() => sql.end());

Conclusion

Insert time in postgres.js is the same as in psql (I'm impressed!) but select is about 100 times slower.
So, if insert time can be the same then select time can be about the same as well

porsager · 2021-09-09T20:36:30Z

@stalniy psql doesn't parse the json..

Could you try to see how long time
JSON.parse(fs.readFileSync(`${process.cwd()}/../sf-city-lots-json/citylots.json`, 'utf-8')) takes?

Unless the column is of type text of course

stalniy · 2021-09-09T20:54:06Z

I tried to replace JSON.parse(x) with just x and got 3-9 seconds of improvement. So, it's nothing in comparison to 2mins.

I actually did a bit of investigation and the root cause I think is Buffer allocation. It's super slow in nodejs. Unfortunately, it's very low level for me, I've never used Buffer's extensively and know nothing about PG protocol, so hard to provide suggestions/fix for this

eugene1g · 2021-09-09T21:09:00Z

@porsager If you replace the logic of buffer concat with BufferList, specifically using the dependency-free implementation at https://github.com/rvagg/bl/blob/master/BufferList.js , then performance for large payloads gets close to the theoretical max: 50x faster than right now, and ~2x slower than readFileSync + JSON.parse of the same JSON data.

porsager · 2021-09-09T21:57:42Z

Ok, just took some time to look properly at this, and @dlbeer issue is right on the money. Sorry for not digging into it sooner, and thanks a lot for pointing it out!!

I've got a working fix that doesn't introduce any dependencies or big abstractions, and no performance loss for normal usage. It seems to be faster or same speed as psql, so I guess we just got about 80x gain with @stalniy 's example 😂

You can try it out here #229

@stalniy it's because we're receiving a single really big message but in many small chunks, and for every chunk we created a new buffer by concatenating what we had with the next chunk. In the fix we just push all chunks to an array until we're done and then create a buffer from that array.

porsager mentioned this pull request Sep 9, 2021

WIP: fix slow parsing of large messages #229

Merged

porsager closed this Sep 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

connection: fix O(n^2) reply parsing. #207

connection: fix O(n^2) reply parsing. #207

dlbeer commented Aug 3, 2021

porsager commented Aug 3, 2021

dlbeer commented Aug 3, 2021 via email

porsager commented Aug 4, 2021 •

edited

Loading

dlbeer commented Aug 4, 2021 via email

porsager commented Aug 4, 2021

Minigugus commented Aug 4, 2021 •

edited

Loading

dlbeer commented Aug 4, 2021 via email

Minigugus commented Aug 4, 2021 •

edited

Loading

dlbeer commented Aug 4, 2021 via email

stalniy commented Sep 9, 2021 •

edited

Loading

porsager commented Sep 9, 2021 •

edited

Loading

stalniy commented Sep 9, 2021

eugene1g commented Sep 9, 2021 •

edited

Loading

porsager commented Sep 9, 2021 •

edited

Loading

connection: fix O(n^2) reply parsing. #207

connection: fix O(n^2) reply parsing. #207

Conversation

dlbeer commented Aug 3, 2021

porsager commented Aug 3, 2021

dlbeer commented Aug 3, 2021 via email

porsager commented Aug 4, 2021 • edited Loading

dlbeer commented Aug 4, 2021 via email

porsager commented Aug 4, 2021

Minigugus commented Aug 4, 2021 • edited Loading

dlbeer commented Aug 4, 2021 via email

Minigugus commented Aug 4, 2021 • edited Loading

dlbeer commented Aug 4, 2021 via email

stalniy commented Sep 9, 2021 • edited Loading

psql:

js

Conclusion

porsager commented Sep 9, 2021 • edited Loading

stalniy commented Sep 9, 2021

eugene1g commented Sep 9, 2021 • edited Loading

porsager commented Sep 9, 2021 • edited Loading

porsager commented Aug 4, 2021 •

edited

Loading

Minigugus commented Aug 4, 2021 •

edited

Loading

Minigugus commented Aug 4, 2021 •

edited

Loading

stalniy commented Sep 9, 2021 •

edited

Loading

porsager commented Sep 9, 2021 •

edited

Loading

eugene1g commented Sep 9, 2021 •

edited

Loading

porsager commented Sep 9, 2021 •

edited

Loading