Skip to content

Introducing styleCacheMode. Up to 3x performance improvements on xlsx… #2867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

brunoargolo
Copy link
Contributor

@brunoargolo brunoargolo commented Dec 19, 2024

Currently useStyles can have a dramatic impact on xlsx write performance, both on stream and non stream writer.

A large part of the problem is how WeakMap is used in conjunction with style object handling on cells. While this can be worked around as evidenced here: #2041 its an external workaround and does not give users that much flexibility.

Most of the times style objects are either cloned or created anew, so the same style configuration are actually distinct objects and the WeakMap in styles-xform.js does not recognize them as the same style config. This causes the WeakMap to grow very large and have few hits per key, in some cases actually hindering performance compared to not cacheing. While its possible to have good performance with WeakMap the developer needs to be very aware of object re-use and its not obvious for most.

A great example of poor WeakMap performance is when there is a Colum/Row style combination or a style is set inside a loop with a cell.style={..} notation. An example of when is has good performance is when a style is created outside a loop and assigned to many cells via cell.style = myStyleRef; But this also causes issues if a style is modified indirectly impacting all other cells (addressing this is not in scope for this change).

This PR introduces an option when creating a workbook styleCacheMode.
If not specified it behaves just like today using WeakMap so this feature is 100% backwards compatible.

const options = {
  filename: './streamed-workbook.xlsx',
  useStyles: true,
  styleCacheMode: 'FAST_MAP,
  useSharedStrings: true
};
const workbook = new Excel.stream.xlsx.WorkbookWriter(options);

WEAK_MAP: has poor performance if the style objects are not re-used. When you mix Col, row and cell styles a different style object is created per cell and performance deteriorates to being worse then with No Cache.
JSON_MAP: uses JSON.stringify as the key for a style map. The perfromance can be similar to WeakMap or up to 2.5x faster.
FAST_MAP: uses a custom function to encode a style to use it as a key on cache. Should be preferred over JSON_MAP. The encoded style is much smaller and faster to generate then a JSON. The encoding function is designed so two distinct styles can never be encoded to the same key, but if this happens use JSON_MAP instead.
NO_CACHE: In some cases NO_CACHE can be faster than WEAK_MAP. In rare cases it maybe faster than JSON_MAP.

Beanchmark on t3.large for 200k rows with 7 columns:

Mode|AVG (seconds)|x slower than useStyles:false
NO_STYLES|4.16|1.00
WEAK_MAP|21.58|5.19
JSON_MAP|9.24|2.22
FAST_MAP|7.56|1.82
NO_CACHE|19.97|4.81

I have also introduced a convenience method on cell.js called addStyle(style). This is for when you want to add a style to a cell but still respect inheritance of the Col and Row styles.

I'd be happy to make adjustments as needed.

Summary

Improve performance on xlsx writer when useStyles: true.
Cases like and similar for both stream and non stream API: #2041

Test plan

I have added a new unit test file to test FAST_MAP serialization and a new bench mark testing the different cache modes.
to run the benchmark

npm run benchmark:styles

Related to source code (for typings update)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants