Skip to content

Faster builders for immutable.{TreeMap,TreeSet} by using mutation internally #8794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Oct 22, 2020

Conversation

mkeskells
Copy link
Contributor

@mkeskells mkeskells commented Mar 7, 2020

Summary

Reduce CPU and allocations when building RedBlackTree based structures (immutable.{TreeMap, TreeSet}) by using mutation in the builder. To make this work, the internal encoding of RedBlackTree is changed to use a single class, rather than encoding the colour into separate classes.

Mutability Handling

Mutable Trees are never contained in a TreeMap, or TreeSet, only ever in internal bulk operations of the builder. An immutable Tree cannot have a mutable Tree as a child, but a mutable Tree may have an immutable Tree as a child. This is to promote structural sharing

The major change the Tree is that the size, mutability, and colour are all determined by the _count field of the Tree. In the prior implementation the count ranged from 1 .. MaxInt, and the colour was based on the class. In the new implementation the colour is encoded in the top bit of _count and a value of 0 in the lower 31 bits indicated that the Tree is mutable. This way the range of count is maintained, and the other features are encoded within the same JVM footprint

Once a Tree has transitioned from mutable to immutable it can not become mutable again, so we use a copy-on-write to yield the mutable structure. Before the copy we always check that the values require a change, and if not return the structure unmodified to reduce garbage and promote structural sharing

All of the methods that can yield a mutable result have "mutable" on their name, and generally there is another method similarly named with doesn't. This is to aid safety and to reduce the cognitive load when reviewing changes. e.g.
upd with update an immutable Tree, producing an immutable Tree
mutableUpd will update a mutable or immutable Tree and may return a mutable or immutable Tree

When the series of mutable operations are complete, and the Tree needs to be exposed to the TreeMap/TreeSet the builder calls beforePublish which ensures that the returned Tree is immutable, and memory fences are enforced

Compatibliity:

The existing serialization form of these structures can be decoded. This is achieved keeping the RedBlackTree.{RedTree, BlackTree} classes in place with readResolve methods to generate instances of the new class, NewRedBlackTree.Tree. The new tests show that that the old format can be deserialized.

Conversely, the new structure has a writeReplace to convert to the old classes before serialization. This resulting serialized form is not byte-for-byte identical to before, because the type of leftTree/rightTree is now AnyRef, and Java serialization encodes this in the wire format, although it is not used in deserialization. The resulting format can be decoded on older version of 2.11.11 as follows:

➜ CL=/Users/jz/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/org/apache/commons/commons-lang3/3.11/commons-lang3-3.11.jar

➜ scala-launch 2.12.12 -nc -cp "$CL" -e 'java.nio.file.Files.write(java.nio.file.Paths.get("/tmp/ser.out"), org.apache.commons.lang3.SerializationUtils.serialize(scala.collection.mutable.TreeSet(1, 2, 3)))'

➜ scala-pr 8794 -nc -cp "$CL" -e 'println(org.apache.commons.lang3.SerializationUtils.deserialize(java.nio.file.Files.readAllBytes(java.nio.file.Paths.get("/tmp/ser.out"))))'
TreeSet(1, 2, 3)

➜ scala-pr 8794 -nc -cp "$CL" -e 'java.nio.file.Files.write(java.nio.file.Paths.get("/tmp/ser.out"), org.apache.commons.lang3.SerializationUtils.serialize(scala.collection.mutable.TreeSet(1, 2, 3)))'
➜ scala-launch 2.12.12 -nc -cp "$CL" -e 'println(org.apache.commons.lang3.SerializationUtils.deserialize(java.nio.file.Files.readAllBytes(java.nio.file.Paths.get("/tmp/ser.out"))))'
TreeSet(1, 2, 3)

The system properties scala.collection.immutable.TreeSet.newSerialisation and scala.collection.immutable.TreeMap.newSerialisation can be set to true if users don't require serialization compatiblity with older Scala 2.12 releases and want to opt-in to a more direct and compact serialization format, like the approach already in use in 2.13.x.

Irrespective of these setting the new reader will read old and new formats. The above setting apply to the format generated after this change. Prior code will not be able to read the new format

Performance.

The builder builds trees about 40-50% less CPU, and generally creates little or no garbage for ++= and +=.
When ++= is called with a compatible tree (i.e. with the same Ordering) the builder uses the existing union method, so there may be some garbage but the CPU usage is reduced further, and it promotes structural sharing, which seems a good trade-off


*** review notes

The first 3 commits are enablers ( backport of the RB tree, serialisation issues etc). The body of the mutability change is the later ones

*** current status
I am still looking a couple of places where the performance can be improved, but these will be small point fixes.

Detailed performance figures will be published in a separate comment

@scala-jenkins scala-jenkins added this to the 2.12.12 milestone Mar 7, 2020
@mkeskells
Copy link
Contributor Author

I have left in the working to aid review and comment. Its a bit of an evolution

the model is to only have one tree type, and mutability/colour /size encoded in the count

mutable trees can have mutable children, but immutable trees can only have immutable children

you can see the evolution in upd -> mutableUpd -> mutableUpdWith[out]Overwrite -> SetHelper

also balanceRight ->balanceRight1 -> balanceRight(tree)

@mkeskells mkeskells force-pushed the 2.12.x_RedBlackMutable branch 2 times, most recently from a91452e to 87fa763 Compare March 9, 2020 09:02
@szeiger
Copy link
Contributor

szeiger commented Mar 9, 2020

I'm skeptical about the mutability handling. Balancing as part of node creation and deletion already avoids creating an initial node which gets rebalanced immediately. Rebalancing and bulk operations work mostly along a path into the tree. It's not clear to me that you would gain anything by keeping the new nodes (and all parents) initially mutable and then doing another pass to freeze them.

The benefit for bulk building is quite obvious but this could be done without the extra work to handle mutability.

@mkeskells mkeskells force-pushed the 2.12.x_RedBlackMutable branch 2 times, most recently from e1d43fc to 9071c8b Compare March 9, 2020 19:24
@mkeskells mkeskells added library:collections PRs involving changes to the standard collection library performance the need for speed. usually compiler performance, sometimes runtime performance. performance:do_not_allocate Changes to avoid object allocations labels Mar 12, 2020
@mkeskells mkeskells force-pushed the 2.12.x_RedBlackMutable branch from 9071c8b to a10d108 Compare March 24, 2020 09:02
@mkeskells mkeskells changed the title [WIP] [AfterBackport] RedBlackTree with fast mutable builder [WIP] RedBlackTree with fast mutable builder Mar 24, 2020
@mkeskells mkeskells force-pushed the 2.12.x_RedBlackMutable branch 2 times, most recently from ca81ad3 to 38de269 Compare March 29, 2020 21:31
@mkeskells mkeskells force-pushed the 2.12.x_RedBlackMutable branch 2 times, most recently from 437836b to 87b0f89 Compare April 21, 2020 00:26
Copy link
Member

@lrytz lrytz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small observations

@mkeskells mkeskells force-pushed the 2.12.x_RedBlackMutable branch 5 times, most recently from 5855d0a to 21ac188 Compare April 29, 2020 07:10
@mkeskells mkeskells force-pushed the 2.12.x_RedBlackMutable branch 2 times, most recently from b9ee19d to 8678154 Compare May 5, 2020 22:07
@mkeskells mkeskells force-pushed the 2.12.x_RedBlackMutable branch from 8678154 to a05438b Compare May 8, 2020 15:02
@mkeskells mkeskells force-pushed the 2.12.x_RedBlackMutable branch 2 times, most recently from aaeb4db to 04be2c4 Compare May 11, 2020 09:18
@retronym
Copy link
Member

Now that you have specific test cases for TreeMap / TreeSet serialization that account for the fact that the old format can be deserialized, I suggest removing the failing tests from test.files.run.t8549.scala (which are failing because they are overly strict, requiring that de- and re-seralizing the old format should be an identity).

add SerialVersionUID to Set and Map proxy
regenerate and record(int he test) the existing 2.12 serial format for TreeMap and TreeSet
leave comment to explain the areas that need attention in the forward port to 2.13
… classes

provide a simple implementation of addAll for TreMap ad TreeSet
add some tests to cover this case
minor opt to some mutable methods - when converting from immutable to a new mutable Tree, the counts should be initial values (making the tree mutable), not the current counts (making it immutable)
add documentation, remove some duplicate code and tighten visibility/naming
use a final val to inline constant value. Remove duplicate constants
@mkeskells mkeskells force-pushed the 2.12.x_RedBlackMutable branch from 1cfc952 to e6e2e3d Compare October 8, 2020 23:41
@mkeskells
Copy link
Contributor Author

Backport merged: #9229

thanks @dwijnand - rebased to pick up new tests 🤞

@mkeskells
Copy link
Contributor Author

@dwijnand @retronym can this be merged yet or do we need some more tests /review

@retronym retronym merged commit 95a569b into scala:2.12.x Oct 22, 2020
@SethTisue SethTisue added the release-notes worth highlighting in next release notes label Oct 22, 2020
retronym pushed a commit to retronym/scala that referenced this pull request Oct 29, 2020
Cherry picked from  e9d811e..e6e2e3d / scala#8794

The various approaches used to maintain serialization compatibility
in 2.12.x can be dropped as they are obviated by use of serialization
proxies in 2.13.x.
retronym pushed a commit to retronym/scala that referenced this pull request Oct 29, 2020
Cherry picked from  e9d811e..e6e2e3d / scala#8794

The various approaches used to maintain serialization compatibility
in 2.12.x can be dropped as they are obviated by use of serialization
proxies in 2.13.x.
retronym pushed a commit to retronym/scala that referenced this pull request Oct 29, 2020
Cherry picked from  e9d811e..e6e2e3d / scala#8794

The various approaches used to maintain serialization compatibility
in 2.12.x can be dropped as they are obviated by use of serialization
proxies in 2.13.x.
dwijnand pushed a commit to retronym/scala that referenced this pull request Oct 29, 2020
Cherry picked from  e9d811e..e6e2e3d / scala#8794

The various approaches used to maintain serialization compatibility
in 2.12.x can be dropped as they are obviated by use of serialization
proxies in 2.13.x.
@retronym retronym changed the title RedBlackTree with fast mutable builder [ci: last-only] Faster builders for immutable.{TreeMap,TreeSet} by using mutation internally Dec 4, 2020
finaglehelper pushed a commit to twitter/finatra that referenced this pull request Mar 11, 2021
Problem
=======

Upgrade source to https://github.com/scala/scala/releases/tag/v2.12.13 from https://github.com/scala/scala/releases/tag/v2.12.12. This upgrade includes a highlighted feature of configurable warnings and errors (https://www.scala-lang.org/2021/01/12/configuring-and-suppressing-warnings.html / scala/scala#9248).

Other noticable changes our code based will experience

* Exhaustivity warnings for patterns involving tuples
* Better type checking for `CharSequence` arguments scala/scala#9292
* Simplified Reporters scala/scala#8338
* Promotion of `-deprecation` to `-Xlint:deprecation` scala/scala#7714
* Improves performance of building `immutable.{TreeMap,TreeSet}` by using mutation within the builder scala/scala#8794

Full list of changes can be found at https://github.com/scala/scala/pulls?q=is%3Amerged+milestone%3A2.12.13+

Solution
========

* Bump `BUILD` file `SCALA_REV` && `SCALAC_REV_FOR_DEPS`.
* Depdnencies which are not built for scala 2.12.13 needed to be bumped `SEMANTICDB_PLUGIN_REV` && `SEMANTICDB_REV` && `SCALAFIX_REV`.
* Include `-S-Xlint:-deprecation` to `pants.ini` preventing build failures on deprecated annotations (existing behavior)
* Bump `cache_key_gen_version` in `pants.ini` for newly built artifacts on different scala version.
* Removed scalafix plugin (`scalac-profiling`) which is not built for 2.12.13. `scalac-profiling` code looks abandoned by ~3 years.
* Updated all failing tests that could have depended or expected ordered sequences when the sequence was generated from non ordered collections.

Notes
=====

It seems a few tests and golden files in source have depended or tested against a stable sort order when sorted order is not guaranteed. This has been seen in a few places such as output json objects (map key fields), code that uses `groupBy` for rekeying results to the user and transforming into sequences, strings built from sequences or maps that are not guaranteed to be ordered.

The following PR are related to this change in expectations

scala/scala#8794
scala/scala#8948
scala/scala#9218
scala/scala#9376
scala/scala#9091
scala/scala#9216

We took the liberty updating tests with what the current map order would be or updating the test in a way that wouldn't depend on order. Since we may not fully understand all the context of the tests we are hoping this either signals to the owners that there might be some issue with assumed order or signal that upgrading might break implementations due to bug in scala 2.12.13. {D611202} will improve the files changed for workflow builder outside of this change.

Please feel to reach out directly and discuss the changes here or bring up issues with this upgrade. Slack [[https://app.slack.com/client/T86S8GHEG/C01NZAFRLFK|#scala-upgrade-2-12-13]]

JIRA Issues: SCALA-25

Differential Revision: https://phabricator.twitter.biz/D607826
hamzaremmal pushed a commit to hamzaremmal/scala3 that referenced this pull request May 2, 2025
Cherry picked from  e9d811ef..e6e2e3df / scala/scala#8794

The various approaches used to maintain serialization compatibility
in 2.12.x can be dropped as they are obviated by use of serialization
proxies in 2.13.x.
hamzaremmal pushed a commit to scala/scala3 that referenced this pull request May 7, 2025
Cherry picked from  e9d811ef..e6e2e3df / scala/scala#8794

The various approaches used to maintain serialization compatibility
in 2.12.x can be dropped as they are obviated by use of serialization
proxies in 2.13.x.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
library:collections PRs involving changes to the standard collection library performance:do_not_allocate Changes to avoid object allocations performance the need for speed. usually compiler performance, sometimes runtime performance. prio:blocker release blocker (used only by core team, only near release time) release-notes worth highlighting in next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants