Skip to content

Slow and high memory readFile for some xlsx #535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
salterok opened this issue Apr 9, 2018 · 3 comments
Closed

Slow and high memory readFile for some xlsx #535

salterok opened this issue Apr 9, 2018 · 3 comments

Comments

@salterok
Copy link

salterok commented Apr 9, 2018

I'm using this lib to read xls files to gather some useful information from them.
At some time i have noticed that process sometimes crash. While investigating this issue i have found that some files (with noticeable amount of rows are very slow or failed to parse by workbook.xlsx.readFile(file)).

A'm using version 0.6.2 before and now upgrade to 1.1.1 and this issue reproducible on both.

Attaching example file that while parsing crash node process with out of memory.
1523272420-Member_Register_Data_Sheet_(V2).xlsx

@salterok
Copy link
Author

salterok commented Apr 9, 2018

Seems that handling of "Defined Names" caused this issue.

Setter of definedNames model iterates over all definedNames ranges and as i can see initialize all named cells to their default values.

So in previously attached file there are several definedNames with a big range (e.g. $T$14:$T$1048576) so parser try to set default values for all cells in that range and spend a lot of time there and even more use a lot of memory (over 1.5 GB), which cause out of memory.

@holm
Copy link
Contributor

holm commented Nov 11, 2018

We have run into this also. I wonder if we really need to initialize the cells for a given "Defined Names", or they can just be left uninitialized.

@salterok
Copy link
Author

@holm Don't know exactly, but we have patched this part of code and it works fine (note, that we do not use "defined names" feature and also only read files).

Here is the code that patch exceljs of version 1.1.2 to not initialize Defined Names if anyone interested.

import * as DefinedNames from "exceljs/dist/es5/doc/defined-names";

(function patchDefinedNames() {
    const desc = Object.getOwnPropertyDescriptor(DefinedNames.prototype, "model");
    Object.defineProperty(DefinedNames.prototype, "model", {
        get: desc.get,
        set: noop
    });
    
})();

function noop() {}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants