Skip to content

[MAJOR VERSION] Async iterators #1135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 76 commits into from
Jun 2, 2020
Merged

Conversation

alubbe
Copy link
Member

@alubbe alubbe commented Feb 18, 2020

This PR is the result of a lot of groundwork laid by previous PRs #1127 #1125 #829 #1142 #1190 #1140 #1139 #1142

I'm happy to report that this PR adds 200 lines of documentation and deletes over 400 lines of code while keeping all tests green, maintaining performance and at the same time making our code base much easier to reason about and maintain.

I've thrown out everything related to streams & custom flow control logic around that and replaced it with generators (sync wherever possible, async when necessary). I exposed one external stream API at the end, built on the generators underneath.

This PR is now ready to be merged. It supports node v8 via the es5 import.

Here is the new usage:

const workbookReader = new ExcelJS.stream.xlsx.WorkbookReader('./file.xlsx');
for await (const worksheetReader of workbookReader) {
  for await (const row of worksheetReader) {
    // ...
  }
}

@alubbe
Copy link
Member Author

alubbe commented Feb 20, 2020

All the tests are green - I'm really excited what you think of this approach @Siemienik @guyonroche

If we like it, I would love to add support for it our streaming reader, as well, so that the consumers of our library can use for-await-of on the streams that we expose

... and make our streams destroyable without creating memory leaks ;)

@alubbe
Copy link
Member Author

alubbe commented Feb 21, 2020

And here are the benchmarking numbers. First, current master:

####################################################
WARMUP: Current memory usage: 8.63 MB
WARMUP: huge xlsx file profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
WARMUP: huge xlsx file profiling finished in 6961ms
WARMUP: Current memory usage (before GC): 134.96 MB
WARMUP: Current memory usage (after GC): 51.7 MB

####################################################
RUN 1: huge xlsx file profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 1: huge xlsx file profiling finished in 6490ms
RUN 1: Current memory usage (before GC): 177.75 MB
RUN 1: Current memory usage (after GC): 51.78 MB

####################################################
RUN 2: huge xlsx file profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 2: huge xlsx file profiling finished in 6683ms
RUN 2: Current memory usage (before GC): 148.7 MB
RUN 2: Current memory usage (after GC): 51.67 MB

####################################################
RUN 3: huge xlsx file profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 3: huge xlsx file profiling finished in 6350ms
RUN 3: Current memory usage (before GC): 176.76 MB
RUN 3: Current memory usage (after GC): 51.63 MB

And after this PR:

####################################################
WARMUP: Current memory usage: 8.61 MB
WARMUP: huge xlsx file profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
WARMUP: huge xlsx file profiling finished in 6823ms
WARMUP: Current memory usage (before GC): 148.66 MB
WARMUP: Current memory usage (after GC): 41.22 MB

####################################################
RUN 1: huge xlsx file profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 1: huge xlsx file profiling finished in 6749ms
RUN 1: Current memory usage (before GC): 115.94 MB
RUN 1: Current memory usage (after GC): 25.15 MB

####################################################
RUN 2: huge xlsx file profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 2: huge xlsx file profiling finished in 6653ms
RUN 2: Current memory usage (before GC): 97.41 MB
RUN 2: Current memory usage (after GC): 25.12 MB

####################################################
RUN 3: huge xlsx file profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 3: huge xlsx file profiling finished in 6797ms
RUN 3: Current memory usage (before GC): 145.61 MB
RUN 3: Current memory usage (after GC): 41.11 MB

In short, we have a slight slowdown (3-4%), but lower memory usage and much simpler control flow in our code, making it more maintainable.

Copy link
Member

@Siemienik Siemienik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so we are waiting for v3.9.3 has been published

@alubbe
Copy link
Member Author

alubbe commented Apr 15, 2020

I found and fixed another issue with the zip iteration (see ZJONSSON/node-unzipper#191) and have also deprecated CSV#createInputStream

@alubbe
Copy link
Member Author

alubbe commented Apr 27, 2020

I've resolved all merge conflicts, this PR is now ready to be merged

}
);
});

it('should fail fast on a huge file', function() {
this.timeout(20000);
this.timeout(5000);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is currently super flaky on master, this PR fixes it - and so confidently, that we can decrease the timeout from 20s to 5s

@alubbe alubbe force-pushed the async-iterators branch from e11eaf0 to 24d7a5a Compare May 18, 2020 15:55
@alubbe
Copy link
Member Author

alubbe commented May 20, 2020

@guyonroche can we please merge this?

alubbe added 2 commits June 2, 2020 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants