Skip to content

[BUG] Worksheets with a lot of merged cells take forever to parse #2689

Open
@aorsten

Description

@aorsten

🐛 Bug Report

Files with too many merged cells will not load - they take too long.

Lib version: 4.4.0

Steps To Reproduce

  1. Create an Excel book with a worksheet.
  2. In the worksheet, add a large amount of rows where one cell is merged on each row. See attached file.
    toomanymerges.xlsx
  3. Try to parse the workbook using:
  const buffer = fs.readFileSync(filename);
  const workbook = new Workbook();
  console.time('Buffer');
  await workbook.xlsx.load(buffer);
  1. Experience how it takes a long long time to parse.

The expected behaviour:

It should not be so slow. The attached file takes 1.5 minutes to parse in my Node environment. I have encountered worse examples from users of our product, and it runs in the browser, so those instances have just never finished.

Possible solution (optional, but very helpful):

I have noticed that the slow culprit is the _mergeCellsInternal function inside worksheet.js, specifically the check that cells aren't already merges. Which makes sense, because it is built with n squared loops. See my discussion here: #2568

// worksheet.js, ._mergeCellsInternal
    // check cells aren't already merged
    _.each(this._merges, merge => {
      if (merge.intersects(dimensions)) {
        throw new Error('Cannot merge already merged cells');
      }
    });

The above code is the culprit. When I comment out this code, parsing is very fast again. Is there a way this could be more efficient? Is it even necessary? Is it necessary on first parse of the document?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions