-
Notifications
You must be signed in to change notification settings - Fork 5.5k
RTypedData: keep direct reference to IMEMO/fields #14134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think this may cause issues for embedded RTypedData, as the extra space is used for the embedded data. Also, a lot of code depends on |
It's a possibility, and I admit I didn't go over them all to verify, but the overwhelming majority of them likely have the extra space and won't be bumped to the next slot. e.g.
Yes, my intent is for them to keep being at the same offset, seems like I got it wrong, but it's the plan that they stay that way. |
ddeb5e9
to
be6201e
Compare
I wonder how Set ended up with 8B free. It was right at the limit when ce51ef3 was committed. If it already had 8B free, I wouldn't have needed to implement ce51ef3 to free up 8B so it would fit in an 80B slot. |
My bad. Seems like I miscounted. |
Note to self. The offset still match AFAICT. The crash is with T_ZOMBIE, there might be a mismatch there. |
For the non-embedded RTypedData and for embedded RTypedData with 8B free, this seems a clear win as the space is unused. For the embedded RTypedData with less than 8B free, this doubles the amount of space the objects take. At that point, it becomes a memory vs CPU tradeoff. You could handle that case by using one of the flag bits in the pointer, so it can be an per-object/type decision: // in include/ruby/internal/core/rtypeddata.h
#define TYPED_DATA_EMBEDDED_FIELDS ((VALUE)4)
#define TYPED_DATA_PTR_FLAGS ((VALUE)7) I think that's the last free flag bit on 32-bit arch. Alternatively, you could make a decision based on size of the embedded object (assuming you can get that easily), and only embed if the object has more than 8B free in the object slot. This would also increase complexity, because you would need to handle the slower fallback case. I assume the fallback code is the current code, so this is would involve moving the new code to an if branch, and keeping the current code in an else branch. Assuming you don't want to go that route due to the complexity it would entail, how many RTypedData actually use ivars? You mention that it is "quite common for TypedData", but I don't know if that is 50%, 5%, or something else. Is it possible to include a benchmark that shows the CPU speedup from this approach, so we can compare it to the potential doubling of space for certain objects when making a decision regarding the tradeoff? |
It's all gonna depend on what gems you run. But even in Ruby-core, there are a bunch of typed data that use ivars, e.g. Beyond how much it actually is used, I like the idea of not penalizing the pattern of doing a few core functionality in C, and then extending the class from Ruby like we do in Trilogy: https://github.com/trilogy-libraries/trilogy/blob/029db7d874b9d09d3403e4c2418802f5d54e2c4c/contrib/ruby/lib/trilogy.rb
I included a benchmark in my previous PR where I did something similar for |
be6201e
to
d29e93b
Compare
Ok, so I figured what the bug was, I forgot to update Now for the size. The types that would be bumped:
There's also Out of all of those,
Of course as you mentioned there's always the possibility of having yet another flag to make this a conditional, but it's very tricky as it means different offsets based on the flags, so would be extremely finicky to get it working. |
Since Would that satisfy you? |
I should be clear that I don't object to the change in this PR, even if it increases the size of Set. My comments were just discussing the potential impact on memory and whether issue could be mitigated. In general, you don't need Set for small sets (Array generally works for those), so the size of small Set is not critical. That being said, assuming that the Set change doesn't significantly impact the performance of Set, it would be great to keep Set in an 80B slot. However, if that change works for Set, couldn't it also work for Hash and allow Hash to fit in 80B (if you decrease the ar table size from 8 to 3)? |
Interesting. I think you might be right that it would be possible, but then there's even more tradeoffs to consider. |
It's a bit of an arbitrary application, but I decided to run the YJIT bench shipit benchmark with a modified Ruby to record how often generic ivars are accessed: [["VM/thread", 4886971],
["T_STRING", 248114],
["T_HASH", 235517],
["SQLite3::Backup", 127320],
["xmlDoc", 24300],
["T_ARRAY", 9212],
["OpenSSL/Cipher", 4200],
["xmlNodeSet", 2700],
["xmlNode", 2700],
["T_SYMBOL", 533],
["time", 398],
["encoding", 363],
["T_REGEXP", 213],
["proc", 136],
["T_STRUCT", 74],
["set", 32],
["FFI::Type::Builtin", 21],
["FFI::StructField", 15],
["zstream", 8],
["OpenSSL/X509/STORE", 4],
["T_FLOAT", 4],
["Psych/parser", 3],
["msgpack:buffer_view", 2],
["FFI::Type::Mapped", 1],
["T_RATIONAL", 1],
["FFI::DynamicLibrary", 1],
["T_BIGNUM", 1],
] Some do surprise me a bit (e.g. I'm quite surprised about
Then there's a bunch of common C extensions (sqlite3, nokogiri, openssl). |
Actually, revised number with only object that do actually have ivars. Since trying to access ivars on an object that doesn't have any bails out fast: [["VM/thread", 4886969],
["T_HASH", 229501],
["SQLite3::Backup", 122531],
["T_STRING", 70597],
["xmlDoc", 23625],
["T_ARRAY", 9039],
["OpenSSL/Cipher", 2800],
["xmlNode", 2025],
["encoding", 358],
["time", 199],
["proc", 68],
["T_STRUCT", 38],
["OpenSSL/X509/STORE", 3],
["Psych/parser", 2],
["set", 1], It makes much more sense. |
So for For As for |
Out of curiosity I also looked at the sequel benchmark:
Here T_ARRAY seem to come from the older version of sqlite3-ruby that used Array and Hash subclasses for its result sets, but it has been deprecated and removed recently, so that's good new: sparklemotion/sqlite3-ruby@002443e |
Similar to f3206cc but for TypedData. It's quite common for TypedData objects to have a mix of reference in their struct and some ivars. Since we do happen to have 8B free in the RtypedData struct, we could use it to keep a direct reference to the IMEMO/fields saving having to synchronize the VM and lookup the `gen_fields_tbl` on every ivar access. For old school Data classes however, we don't have free space, but this API is soft-deprecated and no longer very common.
d29e93b
to
32f2d5d
Compare
This comment has been minimized.
This comment has been minimized.
32f2d5d
to
9f0f180
Compare
@jeremyevans I opened #14179 to shrink sets back, but figured an independent PR was best. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me
Similar to f3206cc but for TypedData.
It's quite common for TypedData objects to have a mix of reference in their struct and some ivars.
Since we do happen to have 8B free in the RtypedData struct, we could use it to keep a direct reference to the IMEMO/fields saving having to synchronize the VM and lookup the
gen_fields_tbl
on every ivar access.For old school Data classes however, we don't have free space, but this API is soft-deprecated and no longer very common.
cc @tenderlove as mentioned.