-
Notifications
You must be signed in to change notification settings - Fork 5.4k
YJIT: add codegen for String#setbyte #9767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fn rb_str_setbyte(str: VALUE, index: VALUE, value: VALUE) -> VALUE; | ||
} | ||
|
||
// Raises when index is out of range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we okay with breaking the backtrace when YJIT is enabled? We discussed this exact problem on Thursday and you suggested to put a guard on the index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the flip side, you could probably remove jit_prepare_routine_call
if you guard it, so it would allow you to keep local types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd also need to guard for frozen strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method can still raise though? You're saying it won't report the method raising correctly? Why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't say it can't raise. I only talked about the backtrace.
master:
$ ruby -v --yjit-call-threshold=1 -e '"".setbyte(1, 0)'
ruby 3.4.0dev (2024-01-30T19:59:53Z master c1f8d974a8) +YJIT [x86_64-linux]
-e:1:in `setbyte': index 1 out of string (IndexError)
from -e:1:in `<main>'
this branch:
$ ruby -v --yjit-call-threshold=1 -e '"".setbyte(1, 0)'
ruby 3.4.0dev (2024-01-30T20:39:42Z yjit_setbyte 247ee663df) +YJIT [x86_64-linux]
-e:1:in `<main>': index 1 out of string (IndexError)
With this change, you can't know which method is raising the exception. If a line has a chain of method calls, it'd be harder to tell whether it's from setbyte
or not. And this problem will happen only after YJIT compiles it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we'd preserve the backtrace, but at the same time, this can really get in the way of performance optimizations. How do you guys feel about it?
In theory, it may be possible to add JIT support for recovering a correct backtrace based on the PC (we do set up the PC in jit_prepare_routine_call
) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel uneasy about messing with the backtrace since it's user facing. More people should probably be brought into the discussion if we really want to do it.
With stuff like send
and alias methods, it's hard to recover the callee from the PC alone, since that just points you to the callsite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok hmmm.
I feel like if we have to guard that the index is fixnum and in range, and that the string is also frozen, it really nerfs this PR because we're adding a fair bit of extra work on top of a C call that will duplicate the guards we just did.
So this opens up at least 2 discussion threads:
- Can we find a way to recover the backtrace for C calls based on the PC only, without needing to push a Ruby frame?
- How easy it it to guard that a string is not frozen, and that the string has binary encoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For (2), if we're going to guard, it's probably faster and cleaner to write a version of the C function that returns a bool instead of raising, that way we can just call it without preparation and we don't repeat any guards. With the guards, it's really close to actually setting the byte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR, Alan's idea on (2) seems reasonable to me.
- Can we find a way to recover the backtrace for C calls based on the PC only, without needing to push a Ruby frame?
For a long term, we should aim for having this capability. We have so many almost-leaf methods that are not leaf only because they raise. Many C methods need to be inlining-safe when we start inlining non-trivial ISEQs.
We could have a YJIT-global hash map { PC => C method }
, make sure it's currently the only callee from the PC on YJIT (otherwise invalidate the existing block), and resurrect the frame on rb_raise
if the top-most frame's PC is in the hash map.
If the same ISEQ is called from the interpreter for any reason, another frame (setbyte
or another method, not the caller we're compiling) would be the top-most frame, so it wouldn't doubly push the frame.
Before:
After: