-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Implement Set as a core class #13074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
|
set.c
Outdated
# endif | ||
#endif | ||
|
||
typedef struct set_table set_table; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to split the set_table
internal implementation and the Set
type, because there are other places in the VM that could benefit from using a set_table
instead of a st_table
, one example being the fstring_table
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I'll work on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another place that could use a set_stable
is the constant cache tracker:
Line 6355 in 7564793
st_insert(ics, (st_data_t) ic, (st_data_t) Qtrue); |
set.c
Outdated
struct set_table_entry { | ||
set_hash_t hash; | ||
set_data_t key; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really worth having set_hash_t / set_data_t / set_index_t
? I might be off, but I think re-using st_*
types would be easier and would make it trivial to covert code using an st_table
into using a set_table
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the types deliberately different to avoid ambiguity. However, there is no technical reason we couldn't use the same types, and I agree that it could make conversion from st_table
easier.
If we want to reuse st_*
types, one option is to move the set_table
code into st.c
. There are some functions that are the same between the two implementations, so it would eliminate some duplicate code. What are your thoughts on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on that?
That would be fine by me.
For the record there's an in-flight PR that would also benefit from a set_table
: #13077
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Giving some more thought to this, before making changes, I think we should discuss this at the developer meeting next week, as other developers may have thoughts about where the code should go. I'll add this to the meeting agenda.
I fixed this at #13078 |
🆙 Here’s 2¢ from my recent experience: KVPair = Struct.new(:key, :value) do
def hash = ~key.hash
def eql?(other) = key.eql?(other.key) rescue nil
end |
Can you explain the advantage of doing that? Seems to use more objects and is probably slower, plus it would require fairly invasive changes. Additionally, |
set.c
Outdated
static int | ||
mark_key(set_data_t key, set_data_t data) | ||
{ | ||
rb_gc_mark((VALUE)key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rb_gc_mark((VALUE)key); | |
rb_gc_mark_movable((VALUE)key); |
I think you need rb_gc_mark_movable
otherwise there is little point implementing dcompact
.
set.c
Outdated
struct set_object { | ||
set_table *table; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct set_object { | |
set_table *table; | |
}; | |
struct set_object { | |
set_table table; | |
}; |
Would probably be better to embed the set_table
and save a pointer chasing in most places. You can even use RUBY_TYPED_EMBEDDABLE
so it's all in the object slot.
set.c
Outdated
.dsize = set_size, | ||
.dcompact = set_compact, | ||
}, | ||
.flags = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed last night, you can set TYPED_FREE_IMMEDIATELY
Shower thought: Probably doesn't matter, or at least largely offseted by the more efficient implementation, but we're losing the AR_TABLE optimization for small sets. |
1cc9fcc
to
9f9821a
Compare
This PR now implements what was agreed to at the dev meeting:
Additionally, it implements the following improvements:
I think this is now ready to be merged. |
#define RSET_LEV_MASK (FL_USER13 | FL_USER14 | FL_USER15 | /* FL 13..19 */ \ | ||
FL_USER16 | FL_USER17 | FL_USER18 | FL_USER19) | ||
#define RSET_LEV_SHIFT (FL_USHIFT + 13) | ||
#define RSET_LEV_MAX 127 /* 7 bits */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, but given we have plenty of space to spare inside the object slot, we could skip all that iter-level in flags complexity and just use a long
or something inside struct set_object
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an idea for fitting Set
in an 80-byte slot, which would not leave enough spare space. I think there would be a single padding byte in struct set_object
that could be used, but I don't see much advantage to that over the flags approach.
You are correct that there is a significant amount of complexity here, but it was just copied from hash.c, so I assume there shouldn't be any issues (or at least, no issues that do not also affect Hash
).
If I'm unable to get Set
to fit in 80 bytes, I'll come back to this and simplify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If your idea is what is mentioned in 9f9821a (getting rid of typed_flag
), it's not as simple.
The GC need to store one bit of information to know if the object was actually embedded or not (since allocation size is dynamic, if you ask for more than the GC can provide, you won't be embeded).
I have an idea how to reclaim that in the future, but it's a lot of work.
At the end of the day, I don't think Set
being size 160B
is a big deal. It's a bit of a shame we're wasting 72B, but that's still smaller than what the previous implementation was.
set.c
Outdated
RB_OBJ_WRITTEN(set, Qundef, k); | ||
set_insert(table, (st_data_t)k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RB_OBJ_WRITTEN(set, Qundef, k); | |
set_insert(table, (st_data_t)k); | |
set_insert(table, (st_data_t)k); | |
RB_OBJ_WRITTEN(set, Qundef, k); |
Not sure if it has an incidence, but in theory you are supposed to call _WRITTEN
after having created the reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Also, this allows us to only call RB_OBJ_WRITTEN
if the insertion was successful, as you mentioned in another comment. If the insertion was not successful, we did not write to the set_table, and therefore should not call RB_OBJ_WRITTEN
. I've added a commit that adds a couple functions to DRY up this code and ensure that set_insert
is only called in a single place in set.c, as you recommended in another comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@byroot Thank you very much for your detailed review. I just pushed commits to address all of the issues you pointed out.
#define RSET_LEV_MASK (FL_USER13 | FL_USER14 | FL_USER15 | /* FL 13..19 */ \ | ||
FL_USER16 | FL_USER17 | FL_USER18 | FL_USER19) | ||
#define RSET_LEV_SHIFT (FL_USHIFT + 13) | ||
#define RSET_LEV_MAX 127 /* 7 bits */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an idea for fitting Set
in an 80-byte slot, which would not leave enough spare space. I think there would be a single padding byte in struct set_object
that could be used, but I don't see much advantage to that over the flags approach.
You are correct that there is a significant amount of complexity here, but it was just copied from hash.c, so I assume there shouldn't be any issues (or at least, no issues that do not also affect Hash
).
If I'm unable to get Set
to fit in 80 bytes, I'll come back to this and simplify it.
set.c
Outdated
RB_OBJ_WRITTEN(set, Qundef, k); | ||
set_insert(table, (st_data_t)k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Also, this allows us to only call RB_OBJ_WRITTEN
if the insertion was successful, as you mentioned in another comment. If the insertion was not successful, we did not write to the set_table, and therefore should not call RB_OBJ_WRITTEN
. I've added a commit that adds a couple functions to DRY up this code and ensure that set_insert
is only called in a single place in set.c, as you recommended in another comment.
}; | ||
set_iter(set, set_merge_i, (st_data_t)&args); | ||
set_free_embedded(sobj); | ||
memcpy(&sobj->table, new, sizeof(*new)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed in this function as the write barrier are added in set_merge_i
(called a couple lines above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
However I don't know if you cherry-picked the commits from jeremyevans#2, but it seems like @mrnoname1000 authorship was lost in the process. Would be nice to add him back.
Set has been an autoloaded standard library since Ruby 3.2. The standard library Set is less efficient than it could be, as it uses Hash for storage, which stores unnecessary values for each key. Implementation details: * Core Set uses a modified version of `st_table`, named `set_table`. than `s/st_/set_/`, the main difference is that the stored records do not have values, making them 1/3 smaller. `st_table_entry` stores `hash`, `key`, and `record` (value), while `set_table_entry` only stores `hash` and `key`. This results in large sets using ~33% less memory compared to stdlib Set. For small sets, core Set uses 12% more memory (160 byte object slot and 64 malloc bytes, while stdlib set uses 40 for Set and 160 for Hash). More memory is used because the set_table is embedded and 72 bytes in the object slot are currently wasted. Hopefully we can make this more efficient and have it stored in an 80 byte object slot in the future. * All methods are implemented as cfuncs, except the pretty_print methods, which were moved to `lib/pp.rb` (which is where the pretty_print methods for other core classes are defined). As is typical for core classes, internal calls call C functions and not Ruby methods. For example, to check if something is a Set, `rb_obj_is_kind_of` is used, instead of calling `is_a?(Set)` on the related object. * Almost all methods use the same algorithm that the pure-Ruby implementation used. The exception is when calling `Set#divide` with a block with 2-arity. The pure-Ruby method used tsort to implement this. I developed an algorithm that only allocates a single intermediate hash and does not need tsort. * The `flatten_merge` protected method is no longer necessary, so it is not implemented (it could be). * Similar to Hash/Array, subclasses of Set are no longer reflected in `inspect` output. * RDoc from stdlib Set was moved to core Set, with minor updates. This includes a comprehensive benchmark suite for all public Set methods. As you would expect, the native version is faster in the vast majority of cases, and multiple times faster in many cases. There are a few cases where it is significantly slower: * Set.new with no arguments (~1.6x) * Set#compare_by_identity for small sets (~1.3x) * Set#clone for small sets (~1.5x) * Set#dup for small sets (~1.7x) These are slower as Set does not currently use the AR table optimization that Hash does, so a new set_table is initialized for each call. I'm not sure it's worth the complexity to have an AR table-like optimization for small sets (for hashes it makes sense, as small hashes are used everywhere in Ruby). The rbs and repl_type_completor bundled gems will need updates to support core Set. The pull request marks them as allowed failures. This passes all set tests with no changes. The following specs needed modification: * Modifying frozen set error message (changed for the better) * `Set#divide` when passed a 2-arity block no longer yields the same object as both the first and second argument (this seems like an issue with the previous implementation). * Set-like objects that override `is_a?` such that `is_a?(Set)` return `true` are no longer treated as Set instances. * `Set.allocate.hash` is no longer the same as `nil.hash` * `Set#join` no longer calls `Set#to_a` (it calls the underlying C function). * `Set#flatten_merge` protected method is not implemented. Previously, `set.rb` added a `SortedSet` autoload, which loads `set/sorted_set.rb`. This replaces the `Set` autoload in `prelude.rb` with a `SortedSet` autoload, but I recommend removing it and `set/sorted_set.rb`. This moves `test/set/test_set.rb` to `test/ruby/test_set.rb`, reflecting that switch to a core class. This does not move the spec files, as I'm not sure how they should be handled. Internally, this uses the st_* types and functions as much as possible, and only adds set_* types and functions as needed. The underlying set_table implementation is stored in st.c, but there is no public C-API for it, nor is there one planned, in order to keep the ability to change the internals going forward. For internal uses of st_table with Qtrue values, those can probably be replaced with set_table. To do that, include internal/set_table.h. To handle symbol visibility (rb_ prefix), internal/set_table.h uses the same macro approach that include/ruby/st.h uses. The Set class (rb_cSet) and all methods are defined in set.c. There isn't currently a C-API for the Set class, though C-API functions can be added as needed going forward. Implements [Feature #21216] Co-authored-by: Jean Boussier <jean.boussier@gmail.com> Co-authored-by: Oliver Nutter <mrnoname1000@riseup.net>
Certainly. I squashed this into a single commit and added him as a co-author. I'll merge after this passes CI. |
CI is only failing on the same VM_DEBUG ractor btest that is also failing on master, so I am going to merge this. |
Haven't had time to look into this, but while running GHA tests today, the following patch worked:
A few hours later, the following:
The logged error was:
This was running tests against turbo-rails in Puma... |
Thanks for pointing it out, that is indeed an oversight, won't be too hard to fix. |
Yep, I'll work on a patch for Marshal support for Set. |
PR submitted to support marshalling: #13185 |
Set has been an autoloaded standard library since Ruby 3.2. The standard library Set is less efficient than it could be, as it uses Hash for storage, which stores unnecessary values for each key.
Implementation details:
Core Set uses a modified version of
st_table
, namedset_table
. Other thans/st_/set_/
, the main difference is that the stored records do not have values, making them 1/3 smaller.st_table_entry
storeshash
,key
, andrecord
(value), whileset_table_entry
only storeshash
andkey
. This results in large sets using ~33% less memory compared to stdlib Set. For small sets, core Set uses 20% less memory (160 bytes, while stdlib set uses 40 for Set and 160 for Hash).All methods are implemented as cfuncs, except the pretty_print methods, which were moved to
lib/pp.rb
(which is where the pretty_print methods for other core classes are defined). As is typical for core classes, internal calls call C functions and not Ruby methods. For example, to check if something is a Set,rb_obj_is_kind_of
is used, instead of callingis_a?(Set)
on the related object.Almost all methods use the same algorithm that the pure-Ruby implementation used. The exception is when calling
Set#divide
with a block with 2-arity. The pure-Ruby method used tsort to implement this. I developed an algorithm that only allocates a single intermediate hash and does not need tsort.The
flatten_merge
protected method is no longer necessary, so it is not implemented (it could be).Similar to Hash/Array, subclasses of Set are no longer reflected in
inspect
output.RDoc from stdlib Set was moved to core Set, with minor updates.
This includes a comprehensive benchmark suite for all public Set methods. As you would expect, the native version is faster in the vast majority of cases, and multiple times faster in many cases. There are a few cases where it is significantly slower:
These are slower as Set does not currently use the AR table optimization that Hash does, so a new set_table is initialized for each call. I'm not sure it's worth the complexity to have an AR table-like optimization for small sets (for hashes it makes sense, as small hashes are used everywhere in Ruby).
The rbs and repl_type_completor bundled gems will need updates to support core Set. The pull request marks them as allowed failures.
This passes all set tests with no changes. The following specs needed modification:
Set#divide
when passed a 2-arity block no longer yields the same object as both the first and second argument (this seems like an issue with the previous implementation).is_a?
such thatis_a?(Set)
returntrue
are no longer treated as Set instances.Set.allocate.hash
is no longer the same asnil.hash
Set#join
no longer callsSet#to_a
(it calls the underlying C function).Set#flatten_merge
protected method is not implemented.Previously,
set.rb
added aSortedSet
autoload, which loadsset/sorted_set.rb
. This replaces theSet
autoload inprelude.rb
with aSortedSet
autoload, but I recommend removing it andset/sorted_set.rb
.This moves
test/set/test_set.rb
totest/ruby/test_set.rb
, reflecting that switch to a core class. This does not move the spec files, as I'm not sure how they should be handled.