Skip to content

[BUG] stream.xlsx.WorkbookReader.parse() doesn't cleanup temporary files if not fully iterated #2147

@Turtlefight

Description

@Turtlefight

🐛 Bug Report

stream.xlsx.WorkbookReader.parse() does not cleanup the temporary xml files it creates if iteration is stopped prematurely.
This happens whenever iteration is prematurely stopped, e.g.:

const workbookReader = new exceljs.stream.xlsx.WorkbookReader(/*...*/);
for await (const worksheetReader of workbookReader) {
  // stop iteration
  break;
}

which causes all temporary xml files generated by the WorkbookReader to not be cleaned up until process termination.
This is primarily a problem for long-running server applications, since those may be up for very long timespans (i noticed this bug due to 50GB+ of xml files hanging around in the tmp directory of an express application that had been running for ~ 2 months)

Lib version: 4.3.0

Steps To Reproduce

Here's a code example that demonstrates the problem:

  • any-excel-file.xlsx can be any excel file (contents don't matter)

  • For exampleGood() "Cleanup called" will be outputted for each worksheet in the excel file (and the associated temporary file will be deleted)

  • For exampleBad() "Cleanup called" will NEVER be outputted, and all associated temporary files will remain until process termination.

const exceljs = require("exceljs");
const fs = require("fs");
const tmp = require("tmp");

// hooking tmp.file to output "Cleanup called" whenever
// the cleanupCallback gets called
const origTmpFile = tmp.file;
tmp.file = function(cb) {
  origTmpFile(function(err, path, fd, cleanupCallback) {
    cb(err, path, fd, function() {
      console.log("Cleanup called");
      cleanupCallback();
    });
  });
};

async function exampleGood() {
  const workbookReader = new exceljs.stream.xlsx.WorkbookReader(fs.createReadStream("./any-excel-file.xlsx"), {});
  for await (const worksheetReader of workbookReader) {
    // iteration will run until the end
  }
}

async function exampleBad() {
  const workbookReader = new exceljs.stream.xlsx.WorkbookReader(fs.createReadStream("./any-excel-file.xlsx"), {});
  for await (const worksheetReader of workbookReader) {
    // aborting iteration prematurely
    // throw or return would have the same effect
    break;
  }
}


async function main() {
  console.log("GOOD:");
  await exampleGood();

  // "Cleanup called" will NOT be outputted in this case:
  console.log("BAD:");
  await exampleBad();

  console.log("DONE");
}

main();

The expected behaviour:

Temporary files should be cleaned up if iteration stops before reaching the end.

Possible solution (optional, but very helpful):

for await / for in javascript will automatically call .return() on the (async-) iterable if it doesn't complete iteration (to give the iterable a chance to cleanup)

For (async-)generator iterables .return() will result in finally blocks around the yield statement to be executed, e.g.:

async function* foo() {
  try {
    yield 1;
    console.log("AFTER YIELD");
    yield 2;
    yield 3;
  } finally {
    console.log("FINALLY");
  }
}

for await(const _ of foo()) {
  break;
}

will output "FINALLY" (but NOT "AFTER YIELD")

So in this case adding a large try-finally around parse() should suffice to cleanup the temporary files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions