Introducing styleCacheMode. Up to 3x performance improvements on xlsx… #2867
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently useStyles can have a dramatic impact on xlsx write performance, both on stream and non stream writer.
A large part of the problem is how WeakMap is used in conjunction with style object handling on cells. While this can be worked around as evidenced here: #2041 its an external workaround and does not give users that much flexibility.
Most of the times style objects are either cloned or created anew, so the same style configuration are actually distinct objects and the WeakMap in styles-xform.js does not recognize them as the same style config. This causes the WeakMap to grow very large and have few hits per key, in some cases actually hindering performance compared to not cacheing. While its possible to have good performance with WeakMap the developer needs to be very aware of object re-use and its not obvious for most.
A great example of poor WeakMap performance is when there is a Colum/Row style combination or a style is set inside a loop with a cell.style={..} notation. An example of when is has good performance is when a style is created outside a loop and assigned to many cells via cell.style = myStyleRef; But this also causes issues if a style is modified indirectly impacting all other cells (addressing this is not in scope for this change).
This PR introduces an option when creating a workbook styleCacheMode.
If not specified it behaves just like today using WeakMap so this feature is 100% backwards compatible.
WEAK_MAP: has poor performance if the style objects are not re-used. When you mix Col, row and cell styles a different style object is created per cell and performance deteriorates to being worse then with No Cache.
JSON_MAP: uses JSON.stringify as the key for a style map. The perfromance can be similar to WeakMap or up to 2.5x faster.
FAST_MAP: uses a custom function to encode a style to use it as a key on cache. Should be preferred over JSON_MAP. The encoded style is much smaller and faster to generate then a JSON. The encoding function is designed so two distinct styles can never be encoded to the same key, but if this happens use JSON_MAP instead.
NO_CACHE: In some cases NO_CACHE can be faster than WEAK_MAP. In rare cases it maybe faster than JSON_MAP.
Beanchmark on t3.large for 200k rows with 7 columns:
Mode|AVG (seconds)|x slower than useStyles:false
NO_STYLES|4.16|1.00
WEAK_MAP|21.58|5.19
JSON_MAP|9.24|2.22
FAST_MAP|7.56|1.82
NO_CACHE|19.97|4.81
I have also introduced a convenience method on cell.js called addStyle(style). This is for when you want to add a style to a cell but still respect inheritance of the Col and Row styles.
I'd be happy to make adjustments as needed.
Summary
Improve performance on xlsx writer when useStyles: true.
Cases like and similar for both stream and non stream API: #2041
Test plan
I have added a new unit test file to test FAST_MAP serialization and a new bench mark testing the different cache modes.
to run the benchmark
Related to source code (for typings update)