-
Notifications
You must be signed in to change notification settings - Fork 5.4k
YJIT: Allow inlining ISEQ calls with a block #9622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
09b9d49
to
92acf33
Compare
92acf33
to
5357897
Compare
I don't quite understand how this PR works. Please bear with me. The code for send is so complex that I find it hard to follow sometimes. You say it's duplicating the ISEQ for each block. Conceptually, I understand what that means. However, I don't understand why this has an impact on the maximum number of versions in the JIT. Are multiple entry blocks being generated for the ISEQs based on the block given in the call? I guess I'm wondering: would it be possible to somehow duplicate the ISEQ directly at the interpreter level? That might be slightly less complex on the JIT side, and it would avoid having to add extra state to the context? |
Whenever it duplicates each basic block of Say, we have: module Kernel
def tap
yield(self) # basic block 1: putself + invokeblock (version a, version b)
self # basic block 2: putself + leave (version a, version b)
end
end
1.tap { :a }
2.tap { :b }
They would be located linearly in the generated code. It's basically inlining except the frame push/pop for
If "entry blocks" means This PR duplicates blocks of
Technically, it's possible if you rewrite how the interpreter does method lookup. The lookup is currently based on It would also mean to add extra memory overhead to allocate more ISEQs and add more GC pressure because of them. So, yes, it's possible, but harder to implement, uses more memory, and makes things slower in unrelated places. |
// If the callee has :inline_block annotation and the callsite has a block ISEQ, | ||
// duplicate a callee block for each block ISEQ to make its `yield` monomorphic. | ||
if let (Some(BlockHandler::BlockISeq(iseq)), true) = (block, builtin_attrs & BUILTIN_ATTR_INLINE_BLOCK != 0) { | ||
callee_ctx.set_inline_block(iseq); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is the magic sauce. Because the callee_ctx is probably doesn't match anything that currently exists, it's going to generate a new block directly following the call.
This PR introduces an annotation
Primitive.attr! :inline_block
to allow monomorphizingyield
in methods likeArray#each
#9533.yjit-bench
This PR or
Array#each
in Ruby (#9533) doesn't have a significant impact on benchmarks by itself.However, combining this PR and
Array#each
in Ruby with the annotation (branch), it gives the following speedups.hexapdf
is 7% fasterchunky-png
andruby-lsp
are 2% fasteractiverecord
,liquid-c
, andrailsbench
are 1% fasterSee also: the number of Ruby block calls from
Array#each
Shopify/yjit-bench#168Memory usage
Running 25 itrs on
lobsters
with this PR andArray#each
using the annotation (branch),code_region_size
was increased by 1.2%, andyjit_alloc_size
was increased by 4.5%.For optimizing the size of
Context
, I'm thinking of having a globalHashMap
like{ iseq1 => 1, iseq2 => 2, ... }
and store anu16
integer instead of anu64
ISEQ pointer as a next step, which would likely decreaseyjit_alloc_size
.Before
After
microbenchmark
This PR makes the synthetic benchmark of megamorphic
yield
inInteger#times
4.5x faster.