Skip to content

[BUG] Worksheets with a lot of merged cells take forever to parse #2689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
aorsten opened this issue Feb 13, 2024 · 3 comments
Open

[BUG] Worksheets with a lot of merged cells take forever to parse #2689

aorsten opened this issue Feb 13, 2024 · 3 comments

Comments

@aorsten
Copy link

aorsten commented Feb 13, 2024

🐛 Bug Report

Files with too many merged cells will not load - they take too long.

Lib version: 4.4.0

Steps To Reproduce

  1. Create an Excel book with a worksheet.
  2. In the worksheet, add a large amount of rows where one cell is merged on each row. See attached file.
    toomanymerges.xlsx
  3. Try to parse the workbook using:
  const buffer = fs.readFileSync(filename);
  const workbook = new Workbook();
  console.time('Buffer');
  await workbook.xlsx.load(buffer);
  1. Experience how it takes a long long time to parse.

The expected behaviour:

It should not be so slow. The attached file takes 1.5 minutes to parse in my Node environment. I have encountered worse examples from users of our product, and it runs in the browser, so those instances have just never finished.

Possible solution (optional, but very helpful):

I have noticed that the slow culprit is the _mergeCellsInternal function inside worksheet.js, specifically the check that cells aren't already merges. Which makes sense, because it is built with n squared loops. See my discussion here: #2568

// worksheet.js, ._mergeCellsInternal
    // check cells aren't already merged
    _.each(this._merges, merge => {
      if (merge.intersects(dimensions)) {
        throw new Error('Cannot merge already merged cells');
      }
    });

The above code is the culprit. When I comment out this code, parsing is very fast again. Is there a way this could be more efficient? Is it even necessary? Is it necessary on first parse of the document?

@aorsten
Copy link
Author

aorsten commented Feb 15, 2024

Question: Rather than loop every other merge to see if there's a collision, couldn't the following:

    // apply merge
    const master = this.getCell(dimensions.top, dimensions.left);
    for (let i = dimensions.top; i <= dimensions.bottom; i++) {
      for (let j = dimensions.left; j <= dimensions.right; j++) {
        // merge all but the master cell
        if (i > dimensions.top || j > dimensions.left) {
          this.getCell(i, j).merge(master, ignoreStyle);
        }
      }
    }

Check the cells in question and see if they are already part of a merge? As such there are only n-merged cells included in the check?

@aorsten
Copy link
Author

aorsten commented Feb 15, 2024

My attempt at a fix in a PR: #2691

@hubyhub
Copy link

hubyhub commented Feb 26, 2024

thanks you very much! I would also desperately need exactly that fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants