-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Faster builders for immutable.{TreeMap,TreeSet} by using mutation internally #8794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I have left in the working to aid review and comment. Its a bit of an evolution the model is to only have one tree type, and mutability/colour /size encoded in the count mutable trees can have mutable children, but immutable trees can only have immutable children you can see the evolution in upd -> mutableUpd -> mutableUpdWith[out]Overwrite -> SetHelper also balanceRight ->balanceRight1 -> balanceRight(tree) |
a91452e
to
87fa763
Compare
I'm skeptical about the mutability handling. Balancing as part of node creation and deletion already avoids creating an initial node which gets rebalanced immediately. Rebalancing and bulk operations work mostly along a path into the tree. It's not clear to me that you would gain anything by keeping the new nodes (and all parents) initially mutable and then doing another pass to freeze them. The benefit for bulk building is quite obvious but this could be done without the extra work to handle mutability. |
e1d43fc
to
9071c8b
Compare
9071c8b
to
a10d108
Compare
ca81ad3
to
38de269
Compare
437836b
to
87b0f89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small observations
5855d0a
to
21ac188
Compare
b9ee19d
to
8678154
Compare
8678154
to
a05438b
Compare
aaeb4db
to
04be2c4
Compare
Now that you have specific test cases for |
add SerialVersionUID to Set and Map proxy regenerate and record(int he test) the existing 2.12 serial format for TreeMap and TreeSet
leave comment to explain the areas that need attention in the forward port to 2.13
… classes provide a simple implementation of addAll for TreMap ad TreeSet
add some tests to cover this case
minor opt to some mutable methods - when converting from immutable to a new mutable Tree, the counts should be initial values (making the tree mutable), not the current counts (making it immutable)
add documentation, remove some duplicate code and tighten visibility/naming
use a final val to inline constant value. Remove duplicate constants
1cfc952
to
e6e2e3d
Compare
Cherry picked from e9d811e..e6e2e3d / scala#8794 The various approaches used to maintain serialization compatibility in 2.12.x can be dropped as they are obviated by use of serialization proxies in 2.13.x.
Cherry picked from e9d811e..e6e2e3d / scala#8794 The various approaches used to maintain serialization compatibility in 2.12.x can be dropped as they are obviated by use of serialization proxies in 2.13.x.
Cherry picked from e9d811e..e6e2e3d / scala#8794 The various approaches used to maintain serialization compatibility in 2.12.x can be dropped as they are obviated by use of serialization proxies in 2.13.x.
Cherry picked from e9d811e..e6e2e3d / scala#8794 The various approaches used to maintain serialization compatibility in 2.12.x can be dropped as they are obviated by use of serialization proxies in 2.13.x.
Problem ======= Upgrade source to https://github.com/scala/scala/releases/tag/v2.12.13 from https://github.com/scala/scala/releases/tag/v2.12.12. This upgrade includes a highlighted feature of configurable warnings and errors (https://www.scala-lang.org/2021/01/12/configuring-and-suppressing-warnings.html / scala/scala#9248). Other noticable changes our code based will experience * Exhaustivity warnings for patterns involving tuples * Better type checking for `CharSequence` arguments scala/scala#9292 * Simplified Reporters scala/scala#8338 * Promotion of `-deprecation` to `-Xlint:deprecation` scala/scala#7714 * Improves performance of building `immutable.{TreeMap,TreeSet}` by using mutation within the builder scala/scala#8794 Full list of changes can be found at https://github.com/scala/scala/pulls?q=is%3Amerged+milestone%3A2.12.13+ Solution ======== * Bump `BUILD` file `SCALA_REV` && `SCALAC_REV_FOR_DEPS`. * Depdnencies which are not built for scala 2.12.13 needed to be bumped `SEMANTICDB_PLUGIN_REV` && `SEMANTICDB_REV` && `SCALAFIX_REV`. * Include `-S-Xlint:-deprecation` to `pants.ini` preventing build failures on deprecated annotations (existing behavior) * Bump `cache_key_gen_version` in `pants.ini` for newly built artifacts on different scala version. * Removed scalafix plugin (`scalac-profiling`) which is not built for 2.12.13. `scalac-profiling` code looks abandoned by ~3 years. * Updated all failing tests that could have depended or expected ordered sequences when the sequence was generated from non ordered collections. Notes ===== It seems a few tests and golden files in source have depended or tested against a stable sort order when sorted order is not guaranteed. This has been seen in a few places such as output json objects (map key fields), code that uses `groupBy` for rekeying results to the user and transforming into sequences, strings built from sequences or maps that are not guaranteed to be ordered. The following PR are related to this change in expectations scala/scala#8794 scala/scala#8948 scala/scala#9218 scala/scala#9376 scala/scala#9091 scala/scala#9216 We took the liberty updating tests with what the current map order would be or updating the test in a way that wouldn't depend on order. Since we may not fully understand all the context of the tests we are hoping this either signals to the owners that there might be some issue with assumed order or signal that upgrading might break implementations due to bug in scala 2.12.13. {D611202} will improve the files changed for workflow builder outside of this change. Please feel to reach out directly and discuss the changes here or bring up issues with this upgrade. Slack [[https://app.slack.com/client/T86S8GHEG/C01NZAFRLFK|#scala-upgrade-2-12-13]] JIRA Issues: SCALA-25 Differential Revision: https://phabricator.twitter.biz/D607826
Cherry picked from e9d811ef..e6e2e3df / scala/scala#8794 The various approaches used to maintain serialization compatibility in 2.12.x can be dropped as they are obviated by use of serialization proxies in 2.13.x.
Cherry picked from e9d811ef..e6e2e3df / scala/scala#8794 The various approaches used to maintain serialization compatibility in 2.12.x can be dropped as they are obviated by use of serialization proxies in 2.13.x.
Summary
Reduce CPU and allocations when building
RedBlackTree
based structures (immutable.{TreeMap, TreeSet}
) by using mutation in the builder. To make this work, the internal encoding ofRedBlackTree
is changed to use a single class, rather than encoding the colour into separate classes.Mutability Handling
Mutable
Tree
s are never contained in aTreeMap
, orTreeSet
, only ever in internal bulk operations of the builder. An immutableTree
cannot have a mutableTree
as a child, but a mutableTree
may have an immutableTree
as a child. This is to promote structural sharingThe major change the
Tree
is that the size, mutability, and colour are all determined by the_count
field of theTree
. In the prior implementation thecount
ranged from 1 .. MaxInt, and the colour was based on the class. In the new implementation the colour is encoded in the top bit of_count
and a value of 0 in the lower 31 bits indicated that theTree
is mutable. This way the range of count is maintained, and the other features are encoded within the same JVM footprintOnce a
Tree
has transitioned from mutable to immutable it can not become mutable again, so we use a copy-on-write to yield the mutable structure. Before the copy we always check that the values require a change, and if not return the structure unmodified to reduce garbage and promote structural sharingAll of the methods that can yield a mutable result have "mutable" on their name, and generally there is another method similarly named with doesn't. This is to aid safety and to reduce the cognitive load when reviewing changes. e.g.
upd
with update an immutableTree
, producing an immutableTree
mutableUpd
will update a mutable or immutableTree
and may return a mutable or immutableTree
When the series of mutable operations are complete, and the
Tree
needs to be exposed to theTreeMap
/TreeSet
the builder callsbeforePublish
which ensures that the returnedTree
is immutable, and memory fences are enforcedCompatibliity:
The existing serialization form of these structures can be decoded. This is achieved keeping the
RedBlackTree.{RedTree, BlackTree}
classes in place withreadResolve
methods to generate instances of the new class,NewRedBlackTree.Tree
. The new tests show that that the old format can be deserialized.Conversely, the new structure has a
writeReplace
to convert to the old classes before serialization. This resulting serialized form is not byte-for-byte identical to before, because the type ofleftTree/rightTree
is nowAnyRef
, and Java serialization encodes this in the wire format, although it is not used in deserialization. The resulting format can be decoded on older version of 2.11.11 as follows:The system properties
scala.collection.immutable.TreeSet.newSerialisation
andscala.collection.immutable.TreeMap.newSerialisation
can be set totrue
if users don't require serialization compatiblity with older Scala 2.12 releases and want to opt-in to a more direct and compact serialization format, like the approach already in use in 2.13.x.Irrespective of these setting the new reader will read old and new formats. The above setting apply to the format generated after this change. Prior code will not be able to read the new format
Performance.
The builder builds trees about 40-50% less CPU, and generally creates little or no garbage for
++=
and+=
.When
++=
is called with a compatible tree (i.e. with the sameOrdering
) the builder uses the existingunion
method, so there may be some garbage but the CPU usage is reduced further, and it promotes structural sharing, which seems a good trade-off*** review notes
The first 3 commits are enablers ( backport of the RB tree, serialisation issues etc). The body of the mutability change is the later ones
*** current status
I am still looking a couple of places where the performance can be improved, but these will be small point fixes.
Detailed performance figures will be published in a separate comment