Skip to content

RTypedData: keep direct reference to IMEMO/fields #14134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 12, 2025

Conversation

byroot
Copy link
Member

@byroot byroot commented Aug 6, 2025

Similar to f3206cc but for TypedData.

It's quite common for TypedData objects to have a mix of reference in their struct and some ivars.
Since we do happen to have 8B free in the RtypedData struct, we could use it to keep a direct reference to the IMEMO/fields saving having to synchronize the VM and lookup the gen_fields_tbl on every ivar access.

For old school Data classes however, we don't have free space, but this API is soft-deprecated and no longer very common.

cc @tenderlove as mentioned.

@jeremyevans
Copy link
Contributor

I think this may cause issues for embedded RTypedData, as the extra space is used for the embedded data. Also, a lot of code depends on DATA_PTR(obj) working both when obj is RData and non-embedded RTypedData, so moving fields around is going to be problematic. I found this out when working on ce51ef3.

@byroot
Copy link
Member Author

byroot commented Aug 6, 2025

I think this may cause issues for embedded RTypedData, as the extra space is used for the embedded data.

It's a possibility, and I admit I didn't go over them all to verify, but the overwhelming majority of them likely have the extra space and won't be bumped to the next slot. e.g. Set has 8B free.

so moving fields around is going to be problematic.

Yes, my intent is for them to keep being at the same offset, seems like I got it wrong, but it's the plan that they stay that way.

@byroot byroot force-pushed the rtypeddata-fields-obj branch from ddeb5e9 to be6201e Compare August 6, 2025 18:47
@jeremyevans
Copy link
Contributor

It's a possibility, and I admit I didn't go over them all to verify, but the overwhelming majority of them likely have the extra space and won't be bumped to the next slot. e.g. Set has 8B free.

I wonder how Set ended up with 8B free. It was right at the limit when ce51ef3 was committed. If it already had 8B free, I wouldn't have needed to implement ce51ef3 to free up 8B so it would fit in an 80B slot.

@byroot
Copy link
Member Author

byroot commented Aug 6, 2025

I wonder how Set ended up with 8B free. I

My bad. Seems like I miscounted.

@byroot
Copy link
Member Author

byroot commented Aug 6, 2025

my intent is for them to keep being at the same offset, seems like I got it wrong

Note to self. The offset still match AFAICT. The crash is with T_ZOMBIE, there might be a mismatch there.

@jeremyevans
Copy link
Contributor

For the non-embedded RTypedData and for embedded RTypedData with 8B free, this seems a clear win as the space is unused. For the embedded RTypedData with less than 8B free, this doubles the amount of space the objects take. At that point, it becomes a memory vs CPU tradeoff. You could handle that case by using one of the flag bits in the pointer, so it can be an per-object/type decision:

// in include/ruby/internal/core/rtypeddata.h
#define TYPED_DATA_EMBEDDED_FIELDS ((VALUE)4)
#define TYPED_DATA_PTR_FLAGS ((VALUE)7)

I think that's the last free flag bit on 32-bit arch. Alternatively, you could make a decision based on size of the embedded object (assuming you can get that easily), and only embed if the object has more than 8B free in the object slot. This would also increase complexity, because you would need to handle the slower fallback case. I assume the fallback code is the current code, so this is would involve moving the new code to an if branch, and keeping the current code in an else branch.

Assuming you don't want to go that route due to the complexity it would entail, how many RTypedData actually use ivars? You mention that it is "quite common for TypedData", but I don't know if that is 50%, 5%, or something else. Is it possible to include a benchmark that shows the CPU speedup from this approach, so we can compare it to the potential doubling of space for certain objects when making a decision regarding the tradeoff?

@byroot
Copy link
Member Author

byroot commented Aug 6, 2025

how many RTypedData actually use ivars?

It's all gonna depend on what gems you run. But even in Ruby-core, there are a bunch of typed data that use ivars, e.g. Enumerator. Then in gems, you have quite a bunch in OpenSSL, etc.

Beyond how much it actually is used, I like the idea of not penalizing the pattern of doing a few core functionality in C, and then extending the class from Ruby like we do in Trilogy: https://github.com/trilogy-libraries/trilogy/blob/029db7d874b9d09d3403e4c2418802f5d54e2c4c/contrib/ruby/lib/trilogy.rb

Is it possible to include a benchmark that shows the CPU speedup from this approach,

I included a benchmark in my previous PR where I did something similar for T_STRUCT, it's quite significant: f3206cc, and that's a single threaded benchmark. If you start a Ractor, Ruby then need to acquire the VM lock to access the object's ivars.

@byroot byroot force-pushed the rtypeddata-fields-obj branch from be6201e to d29e93b Compare August 7, 2025 09:24
@casperisfine
Copy link
Contributor

Ok, so I figured what the bug was, I forgot to update rb_data_object_wrap( to account for the fields reordering.

Now for the size. The types that would be bumped:

  • Set as discussed (I didn't notice because I was checking in debug mode, and in debug mode size 2 is 88B...)
  • Process::Status. Used to fit in 40B, because the struct is 16B.
  • frame_info AKA Thread::Backtrace::Location. Also used to be 40B, would now be 80B.

There's also pinned_list, but this one is of dynamic size, so it's not really the same.

Out of all of those, Process::Status and frame_info tend to be very temporary objects, and it's rare to allocate a ton of them, so to me it's a rounding error.

Set is a bit more annoying, and I wish it would still fit in 80B, but in 3.4 it was 200B (40B T_OBJECT + 160B T_HASH), so it's not like it's really a regression.

Of course as you mentioned there's always the possibility of having yet another flag to make this a conditional, but it's very tricky as it means different offsets based on the flags, so would be extremely finicky to get it working.

@casperisfine
Copy link
Contributor

Since Set seem to be the contentious point, another option is to shrink set_table, e.g.: master...Shopify:ruby:set-entries-bounds

Would that satisfy you?

@jeremyevans
Copy link
Contributor

Since Set seem to be the contentious point, another option is to shrink set_table, e.g.: master...Shopify:ruby:set-entries-bounds

Would that satisfy you?

I should be clear that I don't object to the change in this PR, even if it increases the size of Set. My comments were just discussing the potential impact on memory and whether issue could be mitigated. In general, you don't need Set for small sets (Array generally works for those), so the size of small Set is not critical.

That being said, assuming that the Set change doesn't significantly impact the performance of Set, it would be great to keep Set in an 80B slot. However, if that change works for Set, couldn't it also work for Hash and allow Hash to fit in 80B (if you decrease the ar table size from 8 to 3)?

@byroot
Copy link
Member Author

byroot commented Aug 7, 2025

allow Hash to fit in 80B (if you decrease the ar table size from 8 to 3)?

Interesting. I think you might be right that it would be possible, but then there's even more tradeoffs to consider.

@byroot
Copy link
Member Author

byroot commented Aug 10, 2025

It's a bit of an arbitrary application, but I decided to run the YJIT bench shipit benchmark with a modified Ruby to record how often generic ivars are accessed:

[["VM/thread", 4886971],
 ["T_STRING", 248114],
 ["T_HASH", 235517],
 ["SQLite3::Backup", 127320],
 ["xmlDoc", 24300],
 ["T_ARRAY", 9212],
 ["OpenSSL/Cipher", 4200],
 ["xmlNodeSet", 2700],
 ["xmlNode", 2700],
 ["T_SYMBOL", 533],
 ["time", 398],
 ["encoding", 363],
 ["T_REGEXP", 213],
 ["proc", 136],
 ["T_STRUCT", 74],
 ["set", 32],
 ["FFI::Type::Builtin", 21],
 ["FFI::StructField", 15],
 ["zstream", 8],
 ["OpenSSL/X509/STORE", 4],
 ["T_FLOAT", 4],
 ["Psych/parser", 3],
 ["msgpack:buffer_view", 2],
 ["FFI::Type::Mapped", 1],
 ["T_RATIONAL", 1],
 ["FFI::DynamicLibrary", 1],
 ["T_BIGNUM", 1],
]

Some do surprise me a bit (e.g. T_FLOAT, T_SYMBOL), but I suspect it's just some very generic codepath checking for ivars, and we should just bail out early for these types.

I'm quite surprised about T_HASH as well, I wasn't expecting it to be this common, I need to check where these comes from.

VM/Thread is most likely ActiveSupport::ExecutionState

Then there's a bunch of common C extensions (sqlite3, nokogiri, openssl).

@byroot
Copy link
Member Author

byroot commented Aug 10, 2025

Actually, revised number with only object that do actually have ivars. Since trying to access ivars on an object that doesn't have any bails out fast:

[["VM/thread", 4886969],
 ["T_HASH", 229501],
 ["SQLite3::Backup", 122531],
 ["T_STRING", 70597],
 ["xmlDoc", 23625],
 ["T_ARRAY", 9039],
 ["OpenSSL/Cipher", 2800],
 ["xmlNode", 2025],
 ["encoding", 358],
 ["time", 199],
 ["proc", 68],
 ["T_STRUCT", 38],
 ["OpenSSL/X509/STORE", 3],
 ["Psych/parser", 2],
 ["set", 1],

It makes much more sense.

@byroot
Copy link
Member Author

byroot commented Aug 10, 2025

So for T_HASH it seems to mostly come from Rack's HeaderHash, and a bit from ActiveSupport::OrderedOptions.

For T_STRING without even checking I'm sure it's String#html_safe, but I took care of that on Rails main a few weeks ago: rails/rails#55352

As for T_ARRAY, it seems to mostly be ActiveSupport::Inflector::Inflections::Uncountables.

@byroot
Copy link
Member Author

byroot commented Aug 10, 2025

Out of curiosity I also looked at the sequel benchmark:

[
  ["SQLite3::Backup", 4073197],
  ["T_ARRAY", 171011],
  ["T_STRING", 56]
}

Here T_ARRAY seem to come from the older version of sqlite3-ruby that used Array and Hash subclasses for its result sets, but it has been deprecated and removed recently, so that's good new: sparklemotion/sqlite3-ruby@002443e

Similar to f3206cc but for TypedData.

It's quite common for TypedData objects to have a mix of reference in
their struct and some ivars.
Since we do happen to have 8B free in the RtypedData struct, we could
use it to keep a direct reference to the IMEMO/fields saving having
to synchronize the VM and lookup the `gen_fields_tbl` on every ivar
access.

For old school Data classes however, we don't have free space, but
this API is soft-deprecated and no longer very common.
@byroot byroot force-pushed the rtypeddata-fields-obj branch from d29e93b to 32f2d5d Compare August 12, 2025 07:32

This comment has been minimized.

@byroot
Copy link
Member Author

byroot commented Aug 12, 2025

@jeremyevans I opened #14179 to shrink sets back, but figured an independent PR was best.

Copy link
Member

@peterzhu2118 peterzhu2118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me

@byroot byroot merged commit 360be94 into ruby:master Aug 12, 2025
84 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants