-
Notifications
You must be signed in to change notification settings - Fork 5.4k
ZJIT: Use Vec instead of HashMap for profiling #13809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
005a81b
to
42ffd3c
Compare
Now I just have to remove the |
This comment has been minimized.
This comment has been minimized.
zjit/src/profile.rs
Outdated
} | ||
|
||
impl IseqProfile { | ||
pub fn new(iseq: IseqPtr) -> Self { | ||
// Pre-size all the operand slots in the opnd_types table so profiling is as fast as possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't quite understand how the pre-sizing is helping here. You're still giving vec![]
as the initial value to each vector and then resizing each.
If we resize the vector when we insert to the index (only) for the first time, it's probably not slower, is it? If we do so, we can skip resizing the vector for unused instructions, so not doing the pre-sizing could be faster and save memory, IIUC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pre-sizing happens once per ISEQ per VM
The previous thing checked once per profile per opcode per ISEQ on the hot path if the vec was the right size
It was an additional 100-200ms savings, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that the current version is faster, but I'm only suggesting the same speedup could be achieved without pre-sizing.
It seems like the slowness in the previous version came (not from "checking" it but) from the fact that the vector was optional and we ended up using clone()
every time to upsert Some()
. But now that it's a non-optional vector and you can always get &mut
of it, you should be able to check if the vector length is 0 and resize it on the first profile, without paying an allocation overhead on future profiles for the same opcode. Checking the length of a vector should be a negligible overhead compared to cloning (or resizing) a vector.
So I didn't get why it was changed to pre-sizing, but since it's already an improvement, I'm fine with merging this as is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove the second commit for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, reverting it seems to slow it down by 5% or so. I guess doing multiple allocations at once helps? Since it does give a speedup in benchmarks, I'm actually fine with not reverting it.
Running benchmark "getivar" (1/1)
+ setarch x86_64 -R taskset -c 10 /opt/rubies/before/bin/ruby --zjit -I harness /home/k0kubun/src/github.com/Shopify/yjit-bench/benchmarks/getivar.rb
ruby 3.5.0dev (2025-07-11T15:51:28Z mb-benchmark-compile c00566de23) +ZJIT +PRISM [x86_64-linux]
itr: time
#1: 651ms
RSS: 14.7MiB
MAXRSS: 15.9MiB
Running benchmark "getivar" (1/1)
+ setarch x86_64 -R taskset -c 10 /opt/rubies/after/bin/ruby --zjit -I harness /home/k0kubun/src/github.com/Shopify/yjit-bench/benchmarks/getivar.rb
ruby 3.5.0dev (2025-07-11T15:51:28Z mb-benchmark-compile 993d5aa2a4) +ZJIT +PRISM [x86_64-linux]
itr: time
#1: 687ms
RSS: 14.6MiB
MAXRSS: 15.9MiB
Total time spent benchmarking: 1s
before: ruby 3.5.0dev (2025-07-11T15:51:28Z mb-benchmark-compile c00566de23) +ZJIT +PRISM [x86_64-linux]
after: ruby 3.5.0dev (2025-07-11T15:51:28Z mb-benchmark-compile 993d5aa2a4) +ZJIT +PRISM [x86_64-linux]
------- ----------- ---------- ---------- ---------- ------------- ------------
bench before (ms) stddev (%) after (ms) stddev (%) after 1st itr before/after
getivar 651.6 0.0 687.2 0.0 0.948 0.948
------- ----------- ---------- ---------- ---------- ------------- ------------
Legend:
- after 1st itr: ratio of before/after time for the first benchmarking iteration.
- before/after: ratio of before/after time. Higher is better for after. Above 1 represents a speedup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll probably instead change profiling to not use Type
but instead do a ClassDistribution
-type thing of (VALUE class, shape_id_t shape)
tuples. Is there a clean way of representing:
- Ruby class
- Fixnum & other immediates
- Shape
in one small data structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no such thing in CRuby itself. YJIT built its own variable-length encoding of profiling data in Context::encode()
/Context::decode()
. We probably need something different from YJIT's, so I guess we'll build one for ZJIT too.
Btw shapes are effectively used only for ivars today, so it's probably inefficient to use a Tuple that contains a shape for every instruction.
19dda26
to
993d5aa
Compare
This is notably faster: no need to hash indices.
Before:
After: