Description
🐛 Bug Report
Files with too many merged cells will not load - they take too long.
Lib version: 4.4.0
Steps To Reproduce
- Create an Excel book with a worksheet.
- In the worksheet, add a large amount of rows where one cell is merged on each row. See attached file.
toomanymerges.xlsx - Try to parse the workbook using:
const buffer = fs.readFileSync(filename);
const workbook = new Workbook();
console.time('Buffer');
await workbook.xlsx.load(buffer);
- Experience how it takes a long long time to parse.
The expected behaviour:
It should not be so slow. The attached file takes 1.5 minutes to parse in my Node environment. I have encountered worse examples from users of our product, and it runs in the browser, so those instances have just never finished.
Possible solution (optional, but very helpful):
I have noticed that the slow culprit is the _mergeCellsInternal
function inside worksheet.js
, specifically the check that cells aren't already merges. Which makes sense, because it is built with n squared loops. See my discussion here: #2568
// worksheet.js, ._mergeCellsInternal
// check cells aren't already merged
_.each(this._merges, merge => {
if (merge.intersects(dimensions)) {
throw new Error('Cannot merge already merged cells');
}
});
The above code is the culprit. When I comment out this code, parsing is very fast again. Is there a way this could be more efficient? Is it even necessary? Is it necessary on first parse of the document?