-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Embed const caches that aren't cref dependent #12900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Actually, it might not be, because it doesn't seem to use const caches that don't have a cref. |
224da92
to
0a68a85
Compare
✅ All Tests passed!✖️no tests failed ✔️61905 tests passed(3 flakes) |
c85dd5e
to
0a0b09a
Compare
@ruby/yjit would someone have a bit of time to help me update YJIT to handle the new constcache layout? I suspect I need to check for the |
3dad53e
to
eea34d8
Compare
vm_insnhelper.c
Outdated
const rb_cref_t *cref = vm_get_const_key_cref(reg_ep); | ||
if (cref) { | ||
vm_icc_set_flag(ic, CONST_CACHE_HAS_EXT); | ||
struct iseq_inline_constant_cache_entry *ice = IMEMO_NEW(struct iseq_inline_constant_cache_entry, imemo_constcache, 0); | ||
RB_OBJ_WRITE(iseq, &ic->c.ext, ice); | ||
RB_OBJ_WRITE(ic->c.ext, &ice->value, val); | ||
ice->ic_cref = cref; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's good info thank you. I'll see if I can preserve the thread safety. I think with a bit of care it is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, actually, not so sure, the problem is really that both RACTOR_SHAREABLE
flag and the VALUE
have to be set atomically, so short of having access to 128bit atomics, I don't see how it could be achieved 😢
Reopening, because this might actually be possible using |
vm_core.h
Outdated
@@ -269,22 +273,91 @@ STATIC_ASSERT(sizeof_iseq_inline_constant_cache_entry, | |||
sizeof(const rb_cref_t *)) <= RVALUE_SIZE); | |||
|
|||
struct iseq_inline_constant_cache { | |||
struct iseq_inline_constant_cache_entry *entry; | |||
union { | |||
struct iseq_inline_constant_cache_entry *ext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "ext" stand for? "Ext"ended? I find this name confusing.
Could we just call this extended
and rename the flag to CONST_CACHE_HAS_EXTENDED
(if that's what it means)? Or maybe heap
and CONST_CACHE_IN_HEAP
(or CONST_CACHE_EMBEDDED
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that what it stand for. If I managed to make this work, I'll rename it as you suggested.
@XrXr I think I pushed a new version that work with ractors, but I'm not super accustomed to all the constraints so I'd really appreciate if you could review this. How it works now:
Here I somewhat assume How does that sound? |
vm_insnhelper.c
Outdated
const rb_cref_t *cref = vm_get_const_key_cref(reg_ep); | ||
if (RB_UNLIKELY(cref)) { | ||
struct iseq_inline_constant_cache_entry *ice = IMEMO_NEW(struct iseq_inline_constant_cache_entry, imemo_constcache, 0); | ||
RB_OBJ_WRITE(ice, &ice->value, val); | ||
ice->ic_cref = cref; | ||
cache_val = (VALUE)ice; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add a short comment to explain what this bit does. What does it mean for cref to be null or not null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think letting CONST_CACHE_SHAREABLE
race this way is correct. Since it's not set atomically with the cached value, it's possible to observe a stale set flag from another thread, for a value that's actually not sharable, allowing for smuggling of values between ractors without moving.
😞, do you have any easy to understand literature you'd recommend on this topic? I'd like to better understand the constraints here. While I get the write are not atomic, I don't understand if they're ordered, what's the granularity etc. Alternatively I looked at 128bit atomics, but from what I could find it seems support is rather erratic. |
Sorry, I misspoke. This is actually about atomicity, not ordering. The possible schedule is this:
Generally if two states are not set together atomically, all possible combinations of the two are observable. So even if you put clear before setting the value, a smuggling schedule is still possible. Sorry, I know you two put effort into this and it's not fun to learn about the undocumented intentions, especially since the current code is not 100% correct either. Because it's not using atomics, the writes during a cache fill can be observed in any order, including one where the other thread sees the imemo pointer, but not the initialized contents of the pointed to imemo. (This one is actually about ordering.) Memory ordering is a complex topic and takes a while to build up intuition. This is one of blog posts I read: https://preshing.com/20120710/memory-barriers-are-like-source-control-operations/ but I learned from a bunch of other sources. Another phrase to search around for is "Acquire-Release semantics". |
Ok, I think I understand what you mean, and thank you for the link. If I understood the link correctly, given that the cache is only ever cleared from the main ractor (these are const cache), and AFAIK with the global lock held, we could clear the flag before clearing the value: vm_icc_reset(*ic)
{
cc->tagged_segments &= ~CONST_CACHE_FLAGS_MASK;
// Insert StoreStore fence (not sure how to do that)
cc->value = Qundef;
} This way, it's still possible to see a sharable value without the corresponding flag, but never an unsharable value with the flag. no? If I'm still incorrect just say so, it's ok, I don't want to take your time any further. |
No, smuggling is still possible since the race happens after the fence, when threads compete to fill the cache. So basically the same schedule is possible:
Actually "clear cache" in the older schedule is a bit misleading since it's not really relevant. It's just that there is a window between setting the value and the flag. |
Ok, thank you. So short of having 128bit atomics, or of packing that bit in the |
You could use a spin lock to synchronize fills and reads I guess, but that sacrifices single threaded speed. |
Or I could inline them only if no other ractor has been spawned yet? |
That could work. Since there's no way to transition from embedded to not without 128B atomics, you can treat all embedded caches as busts in multi-ractor mode and fill and clear the flag. And make it so only share-ability claims in imemos are valid. That sounds like it irons out all the inconsistent states. It's hard to reason about, though, and I might be missing something. |
I don't think you need to though. You can keep the embeded caches even after a second ractor has been spawned. |
Ah right. No one writes embedded caches anymore after a ractor spawns. |
Now `ic->value` is set with `RUBY_ATOMIC_VALUE_CAS`. If we fail to write the value, we don't set the `CONST_CACHE_SHAREABLE` flag either. This leave a small race condition window during which `ic->value` is set, but the `CONST_CACHE_SHAREABLE` glaf hasn't been assigned. However, given in such case we'd throw an error, it's OK performance wise to call `rb_ractor_shareable_p` to double check before raising the error.
0d02e6a
to
b87dc3f
Compare
Opening this as a draft to get some CI coverage. I already know RJIT is broken.
Currently all
iseq_inline_constant_cache
have extra storage inside an IMEMO/constcache object.That extension object stores one flag (so one bit), the reference to the cached value (8B) and the
cref
pointer (8B).However the vast majority of const caches don't have a
cref
, which means we're allocating40B
, and use8B
to reference it, to store8B
plus 1 bit.Given that
iseq_inline_constant_cache.segments
is a pointer, we can use the lower 3 bits to store tags.The idea here is to merge
iseq_inline_constant_cache_entry.flags
intoiseq_inline_constant_cache.segments
lower bits,and also store an extra flag that records whether this cache is
cref
dependent.When it's not (which is the vast majority of cases) we can store the cached value directly inside the inline cache, and not need the
_entry
object.Expectations:
IMEMO/constcache
instances are eliminated.cc @etiennebarrie