RTypedData: keep direct reference to IMEMO/fields #14134

byroot · 2025-08-06T18:17:13Z

Similar to f3206cc but for TypedData.

It's quite common for TypedData objects to have a mix of reference in their struct and some ivars.
Since we do happen to have 8B free in the RtypedData struct, we could use it to keep a direct reference to the IMEMO/fields saving having to synchronize the VM and lookup the gen_fields_tbl on every ivar access.

For old school Data classes however, we don't have free space, but this API is soft-deprecated and no longer very common.

cc @tenderlove as mentioned.

jeremyevans · 2025-08-06T18:44:41Z

I think this may cause issues for embedded RTypedData, as the extra space is used for the embedded data. Also, a lot of code depends on DATA_PTR(obj) working both when obj is RData and non-embedded RTypedData, so moving fields around is going to be problematic. I found this out when working on ce51ef3.

byroot · 2025-08-06T18:47:10Z

I think this may cause issues for embedded RTypedData, as the extra space is used for the embedded data.

It's a possibility, and I admit I didn't go over them all to verify, but the overwhelming majority of them likely have the extra space and won't be bumped to the next slot. e.g. Set has 8B free.

so moving fields around is going to be problematic.

Yes, my intent is for them to keep being at the same offset, seems like I got it wrong, but it's the plan that they stay that way.

jeremyevans · 2025-08-06T18:51:38Z

It's a possibility, and I admit I didn't go over them all to verify, but the overwhelming majority of them likely have the extra space and won't be bumped to the next slot. e.g. Set has 8B free.

I wonder how Set ended up with 8B free. It was right at the limit when ce51ef3 was committed. If it already had 8B free, I wouldn't have needed to implement ce51ef3 to free up 8B so it would fit in an 80B slot.

byroot · 2025-08-06T18:54:03Z

I wonder how Set ended up with 8B free. I

My bad. Seems like I miscounted.

byroot · 2025-08-06T19:13:14Z

my intent is for them to keep being at the same offset, seems like I got it wrong

Note to self. The offset still match AFAICT. The crash is with T_ZOMBIE, there might be a mismatch there.

jeremyevans · 2025-08-06T19:15:08Z

For the non-embedded RTypedData and for embedded RTypedData with 8B free, this seems a clear win as the space is unused. For the embedded RTypedData with less than 8B free, this doubles the amount of space the objects take. At that point, it becomes a memory vs CPU tradeoff. You could handle that case by using one of the flag bits in the pointer, so it can be an per-object/type decision:

// in include/ruby/internal/core/rtypeddata.h
#define TYPED_DATA_EMBEDDED_FIELDS ((VALUE)4)
#define TYPED_DATA_PTR_FLAGS ((VALUE)7)

I think that's the last free flag bit on 32-bit arch. Alternatively, you could make a decision based on size of the embedded object (assuming you can get that easily), and only embed if the object has more than 8B free in the object slot. This would also increase complexity, because you would need to handle the slower fallback case. I assume the fallback code is the current code, so this is would involve moving the new code to an if branch, and keeping the current code in an else branch.

Assuming you don't want to go that route due to the complexity it would entail, how many RTypedData actually use ivars? You mention that it is "quite common for TypedData", but I don't know if that is 50%, 5%, or something else. Is it possible to include a benchmark that shows the CPU speedup from this approach, so we can compare it to the potential doubling of space for certain objects when making a decision regarding the tradeoff?

byroot · 2025-08-06T19:23:35Z

how many RTypedData actually use ivars?

It's all gonna depend on what gems you run. But even in Ruby-core, there are a bunch of typed data that use ivars, e.g. Enumerator. Then in gems, you have quite a bunch in OpenSSL, etc.

Beyond how much it actually is used, I like the idea of not penalizing the pattern of doing a few core functionality in C, and then extending the class from Ruby like we do in Trilogy: https://github.com/trilogy-libraries/trilogy/blob/029db7d874b9d09d3403e4c2418802f5d54e2c4c/contrib/ruby/lib/trilogy.rb

Is it possible to include a benchmark that shows the CPU speedup from this approach,

I included a benchmark in my previous PR where I did something similar for T_STRUCT, it's quite significant: f3206cc, and that's a single threaded benchmark. If you start a Ractor, Ruby then need to acquire the VM lock to access the object's ivars.

casperisfine · 2025-08-07T09:33:41Z

Ok, so I figured what the bug was, I forgot to update rb_data_object_wrap( to account for the fields reordering.

Now for the size. The types that would be bumped:

Set as discussed (I didn't notice because I was checking in debug mode, and in debug mode size 2 is 88B...)
Process::Status. Used to fit in 40B, because the struct is 16B.
frame_info AKA Thread::Backtrace::Location. Also used to be 40B, would now be 80B.

There's also pinned_list, but this one is of dynamic size, so it's not really the same.

Out of all of those, Process::Status and frame_info tend to be very temporary objects, and it's rare to allocate a ton of them, so to me it's a rounding error.

Set is a bit more annoying, and I wish it would still fit in 80B, but in 3.4 it was 200B (40B T_OBJECT + 160B T_HASH), so it's not like it's really a regression.

Of course as you mentioned there's always the possibility of having yet another flag to make this a conditional, but it's very tricky as it means different offsets based on the flags, so would be extremely finicky to get it working.

casperisfine · 2025-08-07T11:45:01Z

Since Set seem to be the contentious point, another option is to shrink set_table, e.g.: master...Shopify:ruby:set-entries-bounds

Would that satisfy you?

jeremyevans · 2025-08-07T14:50:36Z

Since Set seem to be the contentious point, another option is to shrink set_table, e.g.: master...Shopify:ruby:set-entries-bounds

Would that satisfy you?

I should be clear that I don't object to the change in this PR, even if it increases the size of Set. My comments were just discussing the potential impact on memory and whether issue could be mitigated. In general, you don't need Set for small sets (Array generally works for those), so the size of small Set is not critical.

That being said, assuming that the Set change doesn't significantly impact the performance of Set, it would be great to keep Set in an 80B slot. However, if that change works for Set, couldn't it also work for Hash and allow Hash to fit in 80B (if you decrease the ar table size from 8 to 3)?

byroot · 2025-08-07T15:23:50Z

allow Hash to fit in 80B (if you decrease the ar table size from 8 to 3)?

Interesting. I think you might be right that it would be possible, but then there's even more tradeoffs to consider.

byroot · 2025-08-10T16:35:21Z

It's a bit of an arbitrary application, but I decided to run the YJIT bench shipit benchmark with a modified Ruby to record how often generic ivars are accessed:

[["VM/thread", 4886971],
 ["T_STRING", 248114],
 ["T_HASH", 235517],
 ["SQLite3::Backup", 127320],
 ["xmlDoc", 24300],
 ["T_ARRAY", 9212],
 ["OpenSSL/Cipher", 4200],
 ["xmlNodeSet", 2700],
 ["xmlNode", 2700],
 ["T_SYMBOL", 533],
 ["time", 398],
 ["encoding", 363],
 ["T_REGEXP", 213],
 ["proc", 136],
 ["T_STRUCT", 74],
 ["set", 32],
 ["FFI::Type::Builtin", 21],
 ["FFI::StructField", 15],
 ["zstream", 8],
 ["OpenSSL/X509/STORE", 4],
 ["T_FLOAT", 4],
 ["Psych/parser", 3],
 ["msgpack:buffer_view", 2],
 ["FFI::Type::Mapped", 1],
 ["T_RATIONAL", 1],
 ["FFI::DynamicLibrary", 1],
 ["T_BIGNUM", 1],
]

Some do surprise me a bit (e.g. T_FLOAT, T_SYMBOL), but I suspect it's just some very generic codepath checking for ivars, and we should just bail out early for these types.

I'm quite surprised about T_HASH as well, I wasn't expecting it to be this common, I need to check where these comes from.

VM/Thread is most likely ActiveSupport::ExecutionState

Then there's a bunch of common C extensions (sqlite3, nokogiri, openssl).

byroot · 2025-08-10T16:40:31Z

Actually, revised number with only object that do actually have ivars. Since trying to access ivars on an object that doesn't have any bails out fast:

[["VM/thread", 4886969],
 ["T_HASH", 229501],
 ["SQLite3::Backup", 122531],
 ["T_STRING", 70597],
 ["xmlDoc", 23625],
 ["T_ARRAY", 9039],
 ["OpenSSL/Cipher", 2800],
 ["xmlNode", 2025],
 ["encoding", 358],
 ["time", 199],
 ["proc", 68],
 ["T_STRUCT", 38],
 ["OpenSSL/X509/STORE", 3],
 ["Psych/parser", 2],
 ["set", 1],

It makes much more sense.

byroot · 2025-08-10T16:49:13Z

So for T_HASH it seems to mostly come from Rack's HeaderHash, and a bit from ActiveSupport::OrderedOptions.

For T_STRING without even checking I'm sure it's String#html_safe, but I took care of that on Rails main a few weeks ago: rails/rails#55352

As for T_ARRAY, it seems to mostly be ActiveSupport::Inflector::Inflections::Uncountables.

byroot · 2025-08-10T16:56:50Z

Out of curiosity I also looked at the sequel benchmark:

[
  ["SQLite3::Backup", 4073197],
  ["T_ARRAY", 171011],
  ["T_STRING", 56]
}

Here T_ARRAY seem to come from the older version of sqlite3-ruby that used Array and Hash subclasses for its result sets, but it has been deprecated and removed recently, so that's good new: sparklemotion/sqlite3-ruby@002443e

Similar to f3206cc but for TypedData. It's quite common for TypedData objects to have a mix of reference in their struct and some ivars. Since we do happen to have 8B free in the RtypedData struct, we could use it to keep a direct reference to the IMEMO/fields saving having to synchronize the VM and lookup the `gen_fields_tbl` on every ivar access. For old school Data classes however, we don't have free space, but this API is soft-deprecated and no longer very common.

byroot · 2025-08-12T09:50:28Z

@jeremyevans I opened #14179 to shrink sets back, but figured an independent PR was best.

variable.c

peterzhu2118

Makes sense to me

byroot force-pushed the rtypeddata-fields-obj branch from ddeb5e9 to be6201e Compare August 6, 2025 18:47

byroot force-pushed the rtypeddata-fields-obj branch from be6201e to d29e93b Compare August 7, 2025 09:24

byroot force-pushed the rtypeddata-fields-obj branch from d29e93b to 32f2d5d Compare August 12, 2025 07:32

This comment has been minimized.

Sign in to view

byroot force-pushed the rtypeddata-fields-obj branch from 32f2d5d to 9f0f180 Compare August 12, 2025 08:29

byroot mentioned this pull request Aug 12, 2025

set.c: Store set_table->bins at the end of set_table->entries #14179

Merged

byroot requested review from peterzhu2118 and jeremyevans August 12, 2025 09:49

peterzhu2118 reviewed Aug 12, 2025

View reviewed changes

variable.c Show resolved Hide resolved

peterzhu2118 approved these changes Aug 12, 2025

View reviewed changes

byroot merged commit 360be94 into ruby:master Aug 12, 2025
84 checks passed

RTypedData: keep direct reference to IMEMO/fields #14134

RTypedData: keep direct reference to IMEMO/fields #14134

Conversation

byroot commented Aug 6, 2025

Uh oh!

jeremyevans commented Aug 6, 2025

Uh oh!

byroot commented Aug 6, 2025

Uh oh!

jeremyevans commented Aug 6, 2025

Uh oh!

byroot commented Aug 6, 2025

Uh oh!

byroot commented Aug 6, 2025

Uh oh!

jeremyevans commented Aug 6, 2025

Uh oh!

byroot commented Aug 6, 2025

Uh oh!

casperisfine commented Aug 7, 2025

Uh oh!

casperisfine commented Aug 7, 2025

Uh oh!

jeremyevans commented Aug 7, 2025

Uh oh!

byroot commented Aug 7, 2025

Uh oh!

byroot commented Aug 10, 2025

Uh oh!

byroot commented Aug 10, 2025

Uh oh!

byroot commented Aug 10, 2025

Uh oh!

byroot commented Aug 10, 2025

Uh oh!

This comment has been minimized.

byroot commented Aug 12, 2025

Uh oh!

Uh oh!

peterzhu2118 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!