improve performance by using splitting steps to different transforms #2958

rluvaton · 2023-04-13T19:27:01Z

Improve the performance of the parser by splitting the parse function into 3 steps:

split buffers to messages (each message is gonna go to the next transform)
parse message
call the callback

This improve the performance as it add back pressure and reduce event loop lag as each message parsed in async way

Would like your opinion before cleaning code duplication.

also, I saw that this package supports node version 10 and 12 which are dead for a long time, which prevent from using async iterator in pipeline

rluvaton · 2023-04-13T19:37:35Z

packages/pg-protocol/src/parser.ts

+            offset: offset + HEADER_LENGTH,
+            code,
+            length,
+            bytes: this.buffer


This will probably lead to a bug as we pass the reference and if we modify it before the order complete this can lead to errors

charmander · 2023-04-17T09:35:42Z

This improve the performance as it add back pressure and reduce event loop lag as each message parsed in async way

Do you have benchmarks? I don’t find that explanation convincing, since parsing is synchronous and the callback doesn’t have a mechanism to turn anything asynchronous into backpressure.

rluvaton · 2023-04-18T18:36:56Z

After benchmarking I found that it's not really helpful 😢

But I found another optimization that gave 500ms boost in the parseRow function:

node-postgres/packages/pg/lib/result.js

Line 62 in 1b022f8

parseRow(rowData) {

From my benchmarking:

look at the total time

base:

after my change:
(look at the anonymous function)

rluvaton · 2023-04-18T18:42:05Z

I want to discuss this potential improvement with someone, where can I do this?

Cc: @charmander @brianc

charmander · 2023-04-18T19:42:01Z

In a pull request, or an issue, or a comment.

rluvaton · 2023-04-18T20:01:42Z

So basically by replacing the

node-postgres/packages/pg/lib/result.js

Line 62 in 1b022f8

parseRow(rowData) {

implementation with generated function using new Function that will be created in theaddFields function

so for this table:

create table if not exists a_table_with_lot_of_data
(
    id    serial primary key,
    name varchar(255),
    number_1  int,
    number_2  int,
    number_3  int,
    number_4  int,
    number_5  int,
    number_6  int,
    number_7  int,
    number_8  int,
    number_9  int,
    number_10 int,
    boolean_1 boolean,
    boolean_2 boolean,
    boolean_3 boolean,
    json_1 json,
    json_2 json,
    json_3 json,
    json_4 json,
    json_5 json,
    json_6 json,
    date_1  timestamp,
    date_2  timestamp,
    date_3  timestamp,
    date_4  timestamp,
    date_5  timestamp,
    date_6  timestamp,
    date_7  timestamp,
    date_8  timestamp,
    date_9  timestamp,
    date_10 timestamp
);

and this query:

select * from a_table_with_lot_of_data;

it will generate something like this :

(function anonymous(rowData
) {
        return {
          id: rowData[0] !== null ? parseInt(rowData[0], 10) : null,
name: rowData[1] !== null ? String(rowData[1]) : null,
number_1: rowData[2] !== null ? parseInt(rowData[2], 10) : null,
number_2: rowData[3] !== null ? parseInt(rowData[3], 10) : null,
number_3: rowData[4] !== null ? parseInt(rowData[4], 10) : null,
number_4: rowData[5] !== null ? parseInt(rowData[5], 10) : null,
number_5: rowData[6] !== null ? parseInt(rowData[6], 10) : null,
number_6: rowData[7] !== null ? parseInt(rowData[7], 10) : null,
number_7: rowData[8] !== null ? parseInt(rowData[8], 10) : null,
number_8: rowData[9] !== null ? parseInt(rowData[9], 10) : null,
number_9: rowData[10] !== null ? parseInt(rowData[10], 10) : null,
number_10: rowData[11] !== null ? parseInt(rowData[11], 10) : null,
boolean_1: rowData[12] !== null ? rowData[12] !== null ? rowData[12] === 't' || rowData[12] === 'TRUE' || rowData[12] === 'true' || rowData[12] === 'y' || rowData[12] === 'yes' || rowData[12] === 'on' || rowData[12] === '1' : null : null,
boolean_2: rowData[13] !== null ? rowData[13] !== null ? rowData[13] === 't' || rowData[13] === 'TRUE' || rowData[13] === 'true' || rowData[13] === 'y' || rowData[13] === 'yes' || rowData[13] === 'on' || rowData[13] === '1' : null : null,
boolean_3: rowData[14] !== null ? rowData[14] !== null ? rowData[14] === 't' || rowData[14] === 'TRUE' || rowData[14] === 'true' || rowData[14] === 'y' || rowData[14] === 'yes' || rowData[14] === 'on' || rowData[14] === '1' : null : null,
json_1: rowData[15] !== null ? JSON.parse(rowData[15]) : null,
json_2: rowData[16] !== null ? JSON.parse(rowData[16]) : null,
json_3: rowData[17] !== null ? JSON.parse(rowData[17]) : null,
json_4: rowData[18] !== null ? JSON.parse(rowData[18]) : null,
json_5: rowData[19] !== null ? JSON.parse(rowData[19]) : null,
json_6: rowData[20] !== null ? JSON.parse(rowData[20]) : null,
date_1: rowData[21] !== null ? String(rowData[21]) : null,
date_2: rowData[22] !== null ? String(rowData[22]) : null,
date_3: rowData[23] !== null ? String(rowData[23]) : null,
date_4: rowData[24] !== null ? String(rowData[24]) : null,
date_5: rowData[25] !== null ? String(rowData[25]) : null,
date_6: rowData[26] !== null ? String(rowData[26]) : null,
date_7: rowData[27] !== null ? String(rowData[27]) : null,
date_8: rowData[28] !== null ? String(rowData[28]) : null,
date_9: rowData[29] !== null ? String(rowData[29]) : null,
date_10: rowData[30] !== null ? String(rowData[30]) : null
        };
})

that will be executed in the parseRow function

When running the bench.js file when did not change anything I got:

Details

warmup done

little queries: 3510
qps 702
on my laptop best so far seen 733 qps

sequence queries: 12200
qps 2440
on my laptop best so far seen 1309 qps

insert queries: 77241
qps 15448.2
on my laptop best so far seen 6445 qps

Warming up bytea test
bytea warmup done
bytea time: 402 ms
bytea length: 104857600 bytes
on my laptop best so far seen 1107ms and 104857600 bytes

little queries: 3380
qps 676
on my laptop best so far seen 733 qps

sequence queries: 11943
qps 2388.6
on my laptop best so far seen 1309 qps

insert queries: 72755
qps 14551
on my laptop best so far seen 6445 qps

Warming up bytea test
bytea warmup done
bytea time: 716 ms
bytea length: 104857600 bytes
on my laptop best so far seen 1107ms and 104857600 bytes

little queries: 3517
qps 703.4
on my laptop best so far seen 733 qps

sequence queries: 12192
qps 2438.4
on my laptop best so far seen 1309 qps

insert queries: 68964
qps 13792.8
on my laptop best so far seen 6445 qps

Warming up bytea test
bytea warmup done
bytea time: 1084 ms
bytea length: 104857600 bytes
on my laptop best so far seen 1107ms and 104857600 bytes

little queries: 3416
qps 683.2
on my laptop best so far seen 733 qps

sequence queries: 11087
qps 2217.4
on my laptop best so far seen 1309 qps

insert queries: 52893
qps 10578.6
on my laptop best so far seen 6445 qps

Warming up bytea test
bytea warmup done
bytea time: 1443 ms
bytea length: 104857600 bytes
on my laptop best so far seen 1107ms and 104857600 bytes

and after my change I got:

Details

start
warmup done

little queries: 2837
qps 567.4
on my laptop best so far seen 733 qps

sequence queries: 12473
qps 2494.6
on my laptop best so far seen 1309 qps

insert queries: 69321
qps 13864.2
on my laptop best so far seen 6445 qps

Warming up bytea test
bytea warmup done
bytea time: 404 ms
bytea length: 104857600 bytes
on my laptop best so far seen 1107ms and 104857600 bytes

little queries: 3546
qps 709.2
on my laptop best so far seen 733 qps

sequence queries: 14820
qps 2964
on my laptop best so far seen 1309 qps

insert queries: 70378
qps 14075.6
on my laptop best so far seen 6445 qps

Warming up bytea test
bytea warmup done
bytea time: 728 ms
bytea length: 104857600 bytes
on my laptop best so far seen 1107ms and 104857600 bytes

little queries: 3548
qps 709.6
on my laptop best so far seen 733 qps

sequence queries: 14947
qps 2989.4
on my laptop best so far seen 1309 qps

insert queries: 70638
qps 14127.6
on my laptop best so far seen 6445 qps

Warming up bytea test
bytea warmup done
bytea time: 1098 ms
bytea length: 104857600 bytes
on my laptop best so far seen 1107ms and 104857600 bytes

little queries: 3461
qps 692.2
on my laptop best so far seen 733 qps

sequence queries: 13298
qps 2659.6
on my laptop best so far seen 1309 qps

insert queries: 50437
qps 10087.4
on my laptop best so far seen 6445 qps

Warming up bytea test
bytea warmup done
bytea time: 1462 ms
bytea length: 104857600 bytes
on my laptop best so far seen 1107ms and 104857600 bytes

WDYT?

charmander · 2023-04-18T20:09:09Z

pg used to do that, but it was removed. Even implementing it without the vulnerability, I don’t like dynamic code generation for this.

rluvaton · 2023-04-18T21:46:49Z

But the vulnerability was fixed, why this optimization was removed then?

48543bf

charmander · 2023-04-19T03:15:36Z

Sorry, I misremembered when it was removed. See #1603.

rluvaton · 2023-04-19T18:20:11Z

If I'm going to return it and fix the memory leak mentioned in #1603, how likely It's going to be merged?

charmander · 2023-04-21T18:45:02Z

Not completely sure about brianc’s thoughts on it today, but I don’t think the marginal performance improvement is worth having dynamic code (and some complicated leak workaround). There are probably lots of other areas to optimize before resorting to that.

rluvaton added 2 commits April 13, 2023 22:23

improve performance by using splitting steps to different transforms

731ccad

format and remove code

072d357

rluvaton commented Apr 13, 2023

View reviewed changes

rluvaton closed this Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

improve performance by using splitting steps to different transforms #2958

improve performance by using splitting steps to different transforms #2958

Uh oh!

rluvaton commented Apr 13, 2023 •

edited

Loading

Uh oh!

rluvaton Apr 13, 2023

Uh oh!

charmander commented Apr 17, 2023 •

edited

Loading

Uh oh!

rluvaton commented Apr 18, 2023

Uh oh!

rluvaton commented Apr 18, 2023

Uh oh!

charmander commented Apr 18, 2023

Uh oh!

rluvaton commented Apr 18, 2023 •

edited

Loading

Uh oh!

charmander commented Apr 18, 2023

Uh oh!

rluvaton commented Apr 18, 2023 •

edited

Loading

Uh oh!

charmander commented Apr 19, 2023

Uh oh!

rluvaton commented Apr 19, 2023

Uh oh!

charmander commented Apr 21, 2023

Uh oh!

Uh oh!

Uh oh!

improve performance by using splitting steps to different transforms #2958

improve performance by using splitting steps to different transforms #2958

Uh oh!

Conversation

rluvaton commented Apr 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rluvaton Apr 13, 2023

Choose a reason for hiding this comment

Uh oh!

charmander commented Apr 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rluvaton commented Apr 18, 2023

Uh oh!

rluvaton commented Apr 18, 2023

Uh oh!

charmander commented Apr 18, 2023

Uh oh!

rluvaton commented Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charmander commented Apr 18, 2023

Uh oh!

rluvaton commented Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charmander commented Apr 19, 2023

Uh oh!

rluvaton commented Apr 19, 2023

Uh oh!

charmander commented Apr 21, 2023

Uh oh!

Uh oh!

rluvaton commented Apr 13, 2023 •

edited

Loading

charmander commented Apr 17, 2023 •

edited

Loading

rluvaton commented Apr 18, 2023 •

edited

Loading

rluvaton commented Apr 18, 2023 •

edited

Loading