Skip to content

get_next_shape_internal: Skip VM lock for single child case #13191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

casperisfine
Copy link
Contributor

If the shape has only one child, we check it lock-free without compromising thread safety.

I haven't computed hard data as to how often that it the case, but we can assume that it's not too rare for shapes to have a single child that is often requested, typically when freezing and object.

@byroot byroot requested review from tenderlove and jhawthorn April 28, 2025 08:41
Copy link

launchable-app bot commented Apr 28, 2025

All Tests passed!

✖️no tests failed ✔️61968 tests passed(3 flakes)

@casperisfine
Copy link
Contributor Author

I haven't computed hard data as to how often that it the case

I did just that and it seems my assumption holds pretty well. From a big monolith:

1 children: 35949 shapes
2 children: 1661 shapes
3 children: 438 shapes
4 children: 159 shapes
5 children: 80 shapes
6 children: 60 shapes
7 children: 39 shapes
8 children: 28 shapes
9 children: 18 shapes
10 children: 15 shapes
11 children: 15 shapes
12 children: 10 shapes

Overall 72.8% of shapes have a single child.

shape.c Outdated
@@ -499,13 +499,23 @@ get_next_shape_internal(rb_shape_t * shape, ID id, enum shape_type shape_type, b

*variation_created = false;

// Fast path: if the shape has a single child, we can check it without a lock
struct rb_id_table *edges = shape->edges;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use atomics here and below where we write shape->edges. Otherwise the compiler could rearrange the initialization of the shape until after the pointer is written and on some platforms the hardware could do the same memory reordering.

There is definitely elsewhere in Ruby we're doing the same thing (ex. inline caches) and apparently not seeing issues, but maybe that is due to insufficient testing on platforms with weaker memory models (arm), and luck that the compiler doesn't choose to rearrange too much/the hardware isn't running "concurrently enough" to hit the issue 🤷‍♂️.

It looks like we have ATOMIC_PTR_LOAD, but not ATOMIC_PTR_SET. I implemented ATOMIC_VALUE_SET here, which really should be the same thing. TBH I'm not sure why we have so much duplication in ruby/atomic.h where I think we know some of these types to be identical.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. I think I understand the concern.

If I understand it correctly, this is mostly setting the pointer that need to use atomic, so that we guaranteed the shape content as fully been written.

I'm not sure I understand why we'd need an atomic load though. Presumably if we can see the pointer in shape->edges, the child shape content should be flushed, and we'd see it consistently too.

Copy link
Member

@jhawthorn jhawthorn Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lazy answer is that the spec says we have to, otherwise it's UB and TSan will complain if we don't. This read of shape->edges could be combined with the one inside the lock (so we would see stale information after taking the lock). But I do think even aside from that it's still necessary to use an atomic.

I'm not 100% on this, but let me try for the less lazy answer 😅. My understanding of the memory model is that if we want writes from one thread to all be visible in another we need the writer to do an atomic operation with at least memory_order_release and then have it read with at least memory_order_acquire (all Ruby's atomics use seq_cst, which should just be stricter). The GCC docs link to a wiki page with some examples.

On x86 I think what you wrote about the content through the pointer being flushed is true (all stores are equivalent to "release" and all loads are equivalent to "acquire"), but on ARM's weaker memory model I don't think it is we end up with different instructions when using atomics and I think those are necessary for the "happens before" guarantee.

I tried making a program to show this happening but did not succeed 😅. So maybe there's something I don't understand about this or the cache lines didn't line up.

If the shape has only one child, we check it lock-free without
compromising thread safety.

I haven't computed hard data as to how often that it the case,
but we can assume that it's not too rare for shapes to have
a single child that is often requested, typically when freezing
and object.
@casperisfine casperisfine force-pushed the lockfree-single-child-shape branch from 0b2ebf0 to 3a49e76 Compare April 29, 2025 07:41
@casperisfine
Copy link
Contributor Author

@jhawthorn alright, I did what you suggested, hopefully this is correct now.

@@ -499,13 +499,27 @@ get_next_shape_internal(rb_shape_t * shape, ID id, enum shape_type shape_type, b

*variation_created = false;

// Fast path: if the shape has a single child, we can check it without a lock
struct rb_id_table *edges = RUBY_ATOMIC_PTR_LOAD(shape->edges);
if (edges && SINGLE_CHILD_P(edges)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the case where edges is NULL we could also allocate the child atomically, and set it with CAS.

Not sure if it's really worth it though, because supposedly code running in ractors produce stable shapes, so write would become rare quickly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we could do that, an issue I think that would create though is that if two Ractors are trying to insert the exact same shape, we would end up creating two shapes for it, for one Ractor the CAS would fail (as is normal with this pattern), but then there'd be no ability to free the shape it had created, since they are never freed (though I suppose we could create a freelist specifically for this case).

@byroot byroot requested a review from jhawthorn April 29, 2025 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants