get_next_shape_internal: Skip VM lock for single child case #13191

casperisfine · 2025-04-28T08:40:46Z

If the shape has only one child, we check it lock-free without compromising thread safety.

I haven't computed hard data as to how often that it the case, but we can assume that it's not too rare for shapes to have a single child that is often requested, typically when freezing and object.

launchable-app · 2025-04-28T09:15:09Z

✅ All Tests passed!

✖️no tests failed ✔️61968 tests passed(3 flakes)

casperisfine · 2025-04-28T15:13:14Z

I haven't computed hard data as to how often that it the case

I did just that and it seems my assumption holds pretty well. From a big monolith:

1 children: 35949 shapes
2 children: 1661 shapes
3 children: 438 shapes
4 children: 159 shapes
5 children: 80 shapes
6 children: 60 shapes
7 children: 39 shapes
8 children: 28 shapes
9 children: 18 shapes
10 children: 15 shapes
11 children: 15 shapes
12 children: 10 shapes

Overall 72.8% of shapes have a single child.

jhawthorn · 2025-04-28T17:58:17Z

shape.c

@@ -499,13 +499,23 @@ get_next_shape_internal(rb_shape_t * shape, ID id, enum shape_type shape_type, b

    *variation_created = false;

+    // Fast path: if the shape has a single child, we can check it without a lock
+    struct rb_id_table *edges = shape->edges;


I think we should use atomics here and below where we write shape->edges. Otherwise the compiler could rearrange the initialization of the shape until after the pointer is written and on some platforms the hardware could do the same memory reordering.

There is definitely elsewhere in Ruby we're doing the same thing (ex. inline caches) and apparently not seeing issues, but maybe that is due to insufficient testing on platforms with weaker memory models (arm), and luck that the compiler doesn't choose to rearrange too much/the hardware isn't running "concurrently enough" to hit the issue 🤷‍♂️.

It looks like we have ATOMIC_PTR_LOAD, but not ATOMIC_PTR_SET. I implemented ATOMIC_VALUE_SET here, which really should be the same thing. TBH I'm not sure why we have so much duplication in ruby/atomic.h where I think we know some of these types to be identical.

Thanks for the comment. I think I understand the concern.

If I understand it correctly, this is mostly setting the pointer that need to use atomic, so that we guaranteed the shape content as fully been written.

I'm not sure I understand why we'd need an atomic load though. Presumably if we can see the pointer in shape->edges, the child shape content should be flushed, and we'd see it consistently too.

The lazy answer is that the spec says we have to, otherwise it's UB and TSan will complain if we don't. This read of shape->edges could be combined with the one inside the lock (so we would see stale information after taking the lock). But I do think even aside from that it's still necessary to use an atomic.

I'm not 100% on this, but let me try for the less lazy answer 😅. My understanding of the memory model is that if we want writes from one thread to all be visible in another we need the writer to do an atomic operation with at least memory_order_release and then have it read with at least memory_order_acquire (all Ruby's atomics use seq_cst, which should just be stricter). The GCC docs link to a wiki page with some examples.

On x86 I think what you wrote about the content through the pointer being flushed is true (all stores are equivalent to "release" and all loads are equivalent to "acquire"), but on ARM's weaker memory model I don't think it is we end up with different instructions when using atomics and I think those are necessary for the "happens before" guarantee.

I tried making a program to show this happening but did not succeed 😅. So maybe there's something I don't understand about this or the cache lines didn't line up.

If the shape has only one child, we check it lock-free without compromising thread safety. I haven't computed hard data as to how often that it the case, but we can assume that it's not too rare for shapes to have a single child that is often requested, typically when freezing and object.

casperisfine · 2025-04-29T07:42:36Z

@jhawthorn alright, I did what you suggested, hopefully this is correct now.

casperisfine · 2025-04-29T07:56:44Z

shape.c

@@ -499,13 +499,27 @@ get_next_shape_internal(rb_shape_t * shape, ID id, enum shape_type shape_type, b

    *variation_created = false;

+    // Fast path: if the shape has a single child, we can check it without a lock
+    struct rb_id_table *edges = RUBY_ATOMIC_PTR_LOAD(shape->edges);
+    if (edges && SINGLE_CHILD_P(edges)) {


I think in the case where edges is NULL we could also allocate the child atomically, and set it with CAS.

Not sure if it's really worth it though, because supposedly code running in ractors produce stable shapes, so write would become rare quickly.

I agree we could do that, an issue I think that would create though is that if two Ractors are trying to insert the exact same shape, we would end up creating two shapes for it, for one Ractor the CAS would fail (as is normal with this pattern), but then there'd be no ability to free the shape it had created, since they are never freed (though I suppose we could create a freelist specifically for this case).

byroot requested review from tenderlove and jhawthorn April 28, 2025 08:41

jhawthorn reviewed Apr 28, 2025

View reviewed changes

casperisfine force-pushed the lockfree-single-child-shape branch from 0b2ebf0 to 3a49e76 Compare April 29, 2025 07:41

casperisfine commented Apr 29, 2025

View reviewed changes

byroot requested a review from jhawthorn April 29, 2025 16:35

jhawthorn approved these changes Apr 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_next_shape_internal: Skip VM lock for single child case #13191

get_next_shape_internal: Skip VM lock for single child case #13191

casperisfine commented Apr 28, 2025

launchable-app bot commented Apr 28, 2025 •

edited

Loading

casperisfine commented Apr 28, 2025

jhawthorn Apr 28, 2025

byroot Apr 28, 2025

jhawthorn Apr 29, 2025 •

edited

Loading

casperisfine commented Apr 29, 2025

casperisfine Apr 29, 2025

jhawthorn Apr 30, 2025

get_next_shape_internal: Skip VM lock for single child case #13191

Are you sure you want to change the base?

get_next_shape_internal: Skip VM lock for single child case #13191

Conversation

casperisfine commented Apr 28, 2025

launchable-app bot commented Apr 28, 2025 • edited Loading

✅ All Tests passed!

casperisfine commented Apr 28, 2025

jhawthorn Apr 28, 2025

Choose a reason for hiding this comment

byroot Apr 28, 2025

Choose a reason for hiding this comment

jhawthorn Apr 29, 2025 • edited Loading

Choose a reason for hiding this comment

casperisfine commented Apr 29, 2025

casperisfine Apr 29, 2025

Choose a reason for hiding this comment

jhawthorn Apr 30, 2025

Choose a reason for hiding this comment

launchable-app bot commented Apr 28, 2025 •

edited

Loading

jhawthorn Apr 29, 2025 •

edited

Loading