-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
improve performance by using splitting steps to different transforms #2958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
offset: offset + HEADER_LENGTH, | ||
code, | ||
length, | ||
bytes: this.buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will probably lead to a bug as we pass the reference and if we modify it before the order complete this can lead to errors
Do you have benchmarks? I don’t find that explanation convincing, since parsing is synchronous and the callback doesn’t have a mechanism to turn anything asynchronous into backpressure. |
After benchmarking I found that it's not really helpful 😢 But I found another optimization that gave 500ms boost in the parseRow function: node-postgres/packages/pg/lib/result.js Line 62 in 1b022f8
From my benchmarking: look at the total time base: after my change: |
I want to discuss this potential improvement with someone, where can I do this? Cc: @charmander @brianc |
In a pull request, or an issue, or a comment. |
So basically by replacing the node-postgres/packages/pg/lib/result.js Line 62 in 1b022f8
new Function that will be created in theaddFields function
so for this table: create table if not exists a_table_with_lot_of_data
(
id serial primary key,
name varchar(255),
number_1 int,
number_2 int,
number_3 int,
number_4 int,
number_5 int,
number_6 int,
number_7 int,
number_8 int,
number_9 int,
number_10 int,
boolean_1 boolean,
boolean_2 boolean,
boolean_3 boolean,
json_1 json,
json_2 json,
json_3 json,
json_4 json,
json_5 json,
json_6 json,
date_1 timestamp,
date_2 timestamp,
date_3 timestamp,
date_4 timestamp,
date_5 timestamp,
date_6 timestamp,
date_7 timestamp,
date_8 timestamp,
date_9 timestamp,
date_10 timestamp
); and this query: select * from a_table_with_lot_of_data; it will generate something like this : (function anonymous(rowData
) {
return {
id: rowData[0] !== null ? parseInt(rowData[0], 10) : null,
name: rowData[1] !== null ? String(rowData[1]) : null,
number_1: rowData[2] !== null ? parseInt(rowData[2], 10) : null,
number_2: rowData[3] !== null ? parseInt(rowData[3], 10) : null,
number_3: rowData[4] !== null ? parseInt(rowData[4], 10) : null,
number_4: rowData[5] !== null ? parseInt(rowData[5], 10) : null,
number_5: rowData[6] !== null ? parseInt(rowData[6], 10) : null,
number_6: rowData[7] !== null ? parseInt(rowData[7], 10) : null,
number_7: rowData[8] !== null ? parseInt(rowData[8], 10) : null,
number_8: rowData[9] !== null ? parseInt(rowData[9], 10) : null,
number_9: rowData[10] !== null ? parseInt(rowData[10], 10) : null,
number_10: rowData[11] !== null ? parseInt(rowData[11], 10) : null,
boolean_1: rowData[12] !== null ? rowData[12] !== null ? rowData[12] === 't' || rowData[12] === 'TRUE' || rowData[12] === 'true' || rowData[12] === 'y' || rowData[12] === 'yes' || rowData[12] === 'on' || rowData[12] === '1' : null : null,
boolean_2: rowData[13] !== null ? rowData[13] !== null ? rowData[13] === 't' || rowData[13] === 'TRUE' || rowData[13] === 'true' || rowData[13] === 'y' || rowData[13] === 'yes' || rowData[13] === 'on' || rowData[13] === '1' : null : null,
boolean_3: rowData[14] !== null ? rowData[14] !== null ? rowData[14] === 't' || rowData[14] === 'TRUE' || rowData[14] === 'true' || rowData[14] === 'y' || rowData[14] === 'yes' || rowData[14] === 'on' || rowData[14] === '1' : null : null,
json_1: rowData[15] !== null ? JSON.parse(rowData[15]) : null,
json_2: rowData[16] !== null ? JSON.parse(rowData[16]) : null,
json_3: rowData[17] !== null ? JSON.parse(rowData[17]) : null,
json_4: rowData[18] !== null ? JSON.parse(rowData[18]) : null,
json_5: rowData[19] !== null ? JSON.parse(rowData[19]) : null,
json_6: rowData[20] !== null ? JSON.parse(rowData[20]) : null,
date_1: rowData[21] !== null ? String(rowData[21]) : null,
date_2: rowData[22] !== null ? String(rowData[22]) : null,
date_3: rowData[23] !== null ? String(rowData[23]) : null,
date_4: rowData[24] !== null ? String(rowData[24]) : null,
date_5: rowData[25] !== null ? String(rowData[25]) : null,
date_6: rowData[26] !== null ? String(rowData[26]) : null,
date_7: rowData[27] !== null ? String(rowData[27]) : null,
date_8: rowData[28] !== null ? String(rowData[28]) : null,
date_9: rowData[29] !== null ? String(rowData[29]) : null,
date_10: rowData[30] !== null ? String(rowData[30]) : null
};
}) that will be executed in the When running the Detailswarmup done little queries: 3510 sequence queries: 12200 insert queries: 77241 Warming up bytea test little queries: 3380 sequence queries: 11943 insert queries: 72755 Warming up bytea test little queries: 3517 sequence queries: 12192 insert queries: 68964 Warming up bytea test little queries: 3416 sequence queries: 11087 insert queries: 52893 Warming up bytea test and after my change I got: Detailsstart little queries: 2837 sequence queries: 12473 insert queries: 69321 Warming up bytea test little queries: 3546 sequence queries: 14820 insert queries: 70378 Warming up bytea test little queries: 3548 sequence queries: 14947 insert queries: 70638 Warming up bytea test little queries: 3461 sequence queries: 13298 insert queries: 50437 Warming up bytea test WDYT? |
pg used to do that, but it was removed. Even implementing it without the vulnerability, I don’t like dynamic code generation for this. |
But the vulnerability was fixed, why this optimization was removed then? |
Sorry, I misremembered when it was removed. See #1603. |
If I'm going to return it and fix the memory leak mentioned in #1603, how likely It's going to be merged? |
Not completely sure about brianc’s thoughts on it today, but I don’t think the marginal performance improvement is worth having dynamic code (and some complicated leak workaround). There are probably lots of other areas to optimize before resorting to that. |
Improve the performance of the parser by splitting the parse function into 3 steps:
This improve the performance as it add back pressure and reduce event loop lag as each message parsed in async way
Would like your opinion before cleaning code duplication.
also, I saw that this package supports node version 10 and 12 which are dead for a long time, which prevent from using async iterator in pipeline