-
Notifications
You must be signed in to change notification settings - Fork 1.8k
1842: New xlsx option for ignoring certain nodes for improved performance #2132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
can someone from the maintainers group review this? |
I would like, however currently I'm overloaded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea on the topic of performance optimization. Thank you for your contribution! I added some proposals to update Readme and reduce unnecessary if. I hope that's not a problem for you 😄 @hofnarwillie, I appreciate your effort, the tests added, and your code is cool 👍
@skypesky or @zurmokeeper, would you like to update README_zh?
Will it merge on the next MergeFest
This PR was reviewed during the MergeFest session.
Sure, I'm happy to do that, and after this pr is merged into master, I'll raise a new PR to update README_zh. |
Summary
This PR is to mitigate the issue described here: #1842
When the xlsx document contains a single dataValidation applied to the entire worksheet (or a very large range) using a cell address like
A1:AAA1048000
then theDataValidationsXform.parseClose()
function loops through each cell in the range and unless you have a super computer it runs out of memory. In some cases these validations are not relevant to the context in which the file is being parsed, so a simple performance improvement is to skip over specific XLSX nodes (in this case dataValidations).Some unrelated linting issues were fixed automatically by the husky pre-commit. I've left them in this PR.
Test plan
Included a couple of integration tests and ensured all the other tests still pass. Also updated the README to describe the new option.
Related to source code (for typings update)
Added typescript typings for
options
passed intoworkbook.xlsx.readFile()
andworkbook.xlsx.load()
. I have not made any changes to thestream.xslx.*
interfaces. Would appreciate some input here from the maintainers of the package as I don't know if it is necessary (i.e. if the stream based implementations will use the same options and parsing techniques).