Skip to content

[BUG] Extremely poor performance with slow readFile and slow writeFile on very small files #1842

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
oscarcoding opened this issue Sep 22, 2021 · 7 comments

Comments

@oscarcoding
Copy link

🐛 Bug Report

Many files are extremely slow to read and write, despite being very small, eg less than 100kb. For example a file with 2 sheets, columns A - L, 200 rows.

This occurs when reading/writing on the API side, ie using nodejs

Lib version: 4.3.0. Node v10 on a Docker image.
Was also occurring on exceljs 3.x

Steps To Reproduce

const filePath = path.join(__dirname, 'myfile.xlsx')
let workbook = new Excel.Workbook()
workbook = await workbook.xlsx.readFile(filePath)
// takes 10 seconds
      
const outFilePath = path.join(__dirname, 'myfile_updated.xslx')
await workbook.xlsx.writeFile(outFilePath)
// takes 45 seconds to one minute

Logs output:

[02:30:31 5170]  START readFile .... 
[02:30:41 1680]  ... DONE readFile
[02:30:41 1700]  START writeFile ....
[02:31:24 9620]   ... DONE writeFile
[02:31:24 9630]  Set headers
[02:31:24 9630]  Send file

Also, occasionally it crashes with an out of memory error when writing the file, which indicates that the slow performance is due to excessive memory use. This is a 67kb file. It should not cause an out of memory error.

==== JS stack trace =========================================
 0: ExitFrame [pc: 0x22878ba5bf1d]
 Security context: 0x2c572fb9e6c1 <JSObject>
    1: /* anonymous */ [0x3333573c1149] [/var/www/api/node_modules/exceljs/lib/xlsx/xform/sheet/data-validations-xform.js:~36] [pc=0x22878ba61947](this=0x3cc2c678d461 <JSGlobal Object>,dataValidation=0x0cfda8d0ac19 <Object map = 0x369bde2c94d9>,address=0x168ed0a4c5a1 <String[7]: K486175>)
    2: /* anonymous */ [0x3333573c11c1] [/var/www/api/node_modules/exc...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

Hopefully the stacktrace helps to find where this is occurring.

I have tried the patchDefinedNames fix suggested in other issues however it does not work because DefinedNames.prototype is undefined:

(function patchDefinedNames() {
  console.log('START patchDefinedNames', DefinedNames, DefinedNames.prototype)
    try {
      const desc = Object.getOwnPropertyDescriptor(DefinedNames.prototype, "model");
      console.log('START defineProperty', desc)
      if (desc) {
        Object.defineProperty(DefinedNames.prototype, "model", {
          get: desc.get,
          set: noop
        });
      }
      console.log('DONE defineProperty')
    } catch (error) {
      console.log('ERROR in patchDefinedNames', error)
    }
  console.log('DONE patchDefinedNames')
})();

Log output from the fix is:

START patchDefinedNames { default: [Function: DefinedNames] } undefined }
ERROR in patchDefinedNames TypeError: Cannot convert undefined or null to object
    at Object.getOwnPropertyDescriptor (<anonymous>)
    at getOwnPropertyDescriptor (/var/www/api/node_modules/core-js/library/fn/object/get-own-property-descriptor.js:4:18)
    at Timeout.patchDefinedNames [as _onTimeout] (/var/www/api/dist/controllers/approval/index.js:71:18)
    at ontimeout (timers.js:436:11)
    at tryOnTimeout (timers.js:300:5)
    at listOnTimeout (timers.js:263:5)
    at Timer.processTimers (timers.js:223:10)
DONE patchDefinedNames

This error occurs because DefinedNames.prototype is undefined. So this suggested fix does not work as at v4.3.0

Note that there are many other reported issues regarding read/write performance, and it does not look like it is being fixed. Instead it looks like this is a serious and ongoing issue.

#535,
#1494,
#1609,
#1017,
#1597,
#847,
#742,
#566,
#535,
#527,
#282,
#125,

The expected behaviour:

File should be read in sub 1 second time.
File should write in 1 or 2 seconds time. Obviously file operations have some overhead.

If this not fixable then sadly we will have to stop using exceljs

@lket
Copy link

lket commented Sep 29, 2021

We had similar problem when the input xlsx had data validation set to the whole column (with few first rows excluded). When limiting it to the first few hundred rows (something like rows 3-500), the issue disappeared.

@tkt028
Copy link

tkt028 commented Apr 1, 2022

For data with around 3000 rows, 13 rows, exceljs could generate Excel fast.
I have data table with more than 500+ columns, 12000 rows. I have been waiting for >15mins, but no Excel was generated. This is a big minus. However, the package csv-stringify takes around 4 seconds to complete the generation of a CSV report of the same data.

@yugu91
Copy link

yugu91 commented Apr 21, 2022

Same to me, Write file 700kb 4K+ row used 2s on m1 and 9s on server

@ryegye24
Copy link

ryegye24 commented May 16, 2022

I was running into a problem like this on a 49kb file where it wouldn't finish loading in even after literally days. I did a deep dive into this and this is what I found.

When loading in a file with data validations applied to ranges of cells, it enumerates the address of each individual cell which the validation applies to and stores that (see https://github.com/exceljs/exceljs/blob/master/lib/xlsx/xform/sheet/data-validations-xform.js#L218). This goes relatively quickly (a little less than a millisecond per address on my machine) until it hits 8,388,607 total cell addresses across all validations, at which point it immediately slows WAY down (~3 seconds per address on my machine). As it happened my small spreadsheet had a lot of data validations applied to large ranges (well past what actually existed in the spreadsheet yet).

Here's the odd thing though, exceljs completely supports data validations by ranges (see #1109 (comment)). I ended up pulling a really screwy workaround where I pre-parsed the spreadsheet using JSZip and xmldom to remove data validations from the sheet xml files, loaded the spreadsheet with exceljs, and then added the data validations back in by their original ranges and that worked. It seems like it should be possible to change the loading logic to just use the ranges though and skip the cell address enumeration.

@papandreou
Copy link
Contributor

papandreou commented May 27, 2022

I just ran into the same thing with a weird (but non-malicious) spreadsheet that got uploaded to a service I'm working on.

Somehow it ended up containing this crazy data validation:

<dataValidation type="list" allowBlank="1" showInputMessage="1" showErrorMessage="1" sqref="F1583:F1048576 G1583:I1048576 Q1583:Q1048576 U1583:AB1048576 S1583:S1048576 L1583:L1048576 AD1583:AE1048576"><formula1>#REF!</formula1></dataValidation>

... which then makes exceljs loop over 17,798,898 cells and set a hash key for each(!)

I came up with this super hacky workaround to turn Range#forEachAddress into a noop:

require('exceljs/lib/doc/range').prototype.forEachAddress = function () {};

It would be nice to avoid that. In my application I don't really need these data validations to be applied, so it would be great if they could be turned off. Alternatively we could use the existing maxCols / maxRows to intersect the ranges with the supported data area.

@mavyfaby
Copy link

any updates on this?

@markygab
Copy link

markygab commented Oct 2, 2024

I've just run into this exact issue too.

My 2c is that it seems Google Sheets cuts off these data ranges with 1048576 rows (the maximum that Excel supports btw) at 1000 rows.

My "fix" for this is to limit the aforementioned forEachAddress method on the Range class to 1000 rows as follows. Had to fork to achieve this.

forEachAddress(cb) {
    for (let col = this.left; col <= this.right; col++) {
      for (let row = this.top; row <= Math.min(this.bottom, 1000); row++) {
        cb(colCache.encodeAddress(row, col), row, col);
      }
    }
 }

Maybe a contributor that knows more might be able to limit it to the actual number or rows that have data in them if that exists?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants