ZJIT: Use Vec instead of HashMap for profiling #13809

tekknolagi · 2025-07-07T17:25:09Z

This is notably faster: no need to hash indices.

Before:

plum% samply record ~/.rubies/ruby-zjit/bin/ruby --zjit benchmarks/getivar.rb
ruby 3.5.0dev (2025-07-10T14:40:49Z master 51252ef8d7) +ZJIT dev +PRISM [arm64-darwin24]
itr:   time
 #1: 5311ms
 #2:   49ms
 #3:   49ms
 #4:   48ms

After:

plum% samply record ~/.rubies/ruby-zjit/bin/ruby --zjit benchmarks/getivar.rb
ruby 3.5.0dev (2025-07-10T15:09:06Z mb-benchmark-compile 42ffd3c1ee) +ZJIT dev +PRISM [arm64-darwin24]
itr:   time
 #1: 1332ms
 #2:   49ms
 #3:   48ms
 #4:   48ms

tekknolagi · 2025-07-10T15:35:46Z

Now I just have to remove the opcode parameter from the C function the interpreter is calling

k0kubun · 2025-07-10T16:56:33Z

zjit/src/profile.rs

 }

 impl IseqProfile {
+    pub fn new(iseq: IseqPtr) -> Self {
+        // Pre-size all the operand slots in the opnd_types table so profiling is as fast as possible


I didn't quite understand how the pre-sizing is helping here. You're still giving vec![] as the initial value to each vector and then resizing each.

If we resize the vector when we insert to the index (only) for the first time, it's probably not slower, is it? If we do so, we can skip resizing the vector for unused instructions, so not doing the pre-sizing could be faster and save memory, IIUC.

This pre-sizing happens once per ISEQ per VM

The previous thing checked once per profile per opcode per ISEQ on the hot path if the vec was the right size

It was an additional 100-200ms savings, I think

I get that the current version is faster, but I'm only suggesting the same speedup could be achieved without pre-sizing.

It seems like the slowness in the previous version came (not from "checking" it but) from the fact that the vector was optional and we ended up using clone() every time to upsert Some(). But now that it's a non-optional vector and you can always get &mut of it, you should be able to check if the vector length is 0 and resize it on the first profile, without paying an allocation overhead on future profiles for the same opcode. Checking the length of a vector should be a negligible overhead compared to cloning (or resizing) a vector.

So I didn't get why it was changed to pre-sizing, but since it's already an improvement, I'm fine with merging this as is for now.

I'll remove the second commit for now

Hmm, reverting it seems to slow it down by 5% or so. I guess doing multiple allocations at once helps? Since it does give a speedup in benchmarks, I'm actually fine with not reverting it.

Running benchmark "getivar" (1/1) + setarch x86_64 -R taskset -c 10 /opt/rubies/before/bin/ruby --zjit -I harness /home/k0kubun/src/github.com/Shopify/yjit-bench/benchmarks/getivar.rb ruby 3.5.0dev (2025-07-11T15:51:28Z mb-benchmark-compile c00566de23) +ZJIT +PRISM [x86_64-linux] itr: time #1: 651ms RSS: 14.7MiB MAXRSS: 15.9MiB Running benchmark "getivar" (1/1) + setarch x86_64 -R taskset -c 10 /opt/rubies/after/bin/ruby --zjit -I harness /home/k0kubun/src/github.com/Shopify/yjit-bench/benchmarks/getivar.rb ruby 3.5.0dev (2025-07-11T15:51:28Z mb-benchmark-compile 993d5aa2a4) +ZJIT +PRISM [x86_64-linux] itr: time #1: 687ms RSS: 14.6MiB MAXRSS: 15.9MiB Total time spent benchmarking: 1s before: ruby 3.5.0dev (2025-07-11T15:51:28Z mb-benchmark-compile c00566de23) +ZJIT +PRISM [x86_64-linux] after: ruby 3.5.0dev (2025-07-11T15:51:28Z mb-benchmark-compile 993d5aa2a4) +ZJIT +PRISM [x86_64-linux] ------- ----------- ---------- ---------- ---------- ------------- ------------ bench before (ms) stddev (%) after (ms) stddev (%) after 1st itr before/after getivar 651.6 0.0 687.2 0.0 0.948 0.948 ------- ----------- ---------- ---------- ---------- ------------- ------------ Legend: - after 1st itr: ratio of before/after time for the first benchmarking iteration. - before/after: ratio of before/after time. Higher is better for after. Above 1 represents a speedup.

I'll probably instead change profiling to not use Type but instead do a ClassDistribution-type thing of (VALUE class, shape_id_t shape) tuples. Is there a clean way of representing:

Ruby class

Fixnum & other immediates

Shape

in one small data structure?

There's no such thing in CRuby itself. YJIT built its own variable-length encoding of profiling data in Context::encode()/Context::decode(). We probably need something different from YJIT's, so I guess we'll build one for ZJIT too.

Btw shapes are effectively used only for ivars today, so it's probably inefficient to use a Tuple that contains a shape for every instruction.

This reverts commit 2324d96.

tekknolagi force-pushed the mb-benchmark-compile branch from 005a81b to 42ffd3c Compare July 10, 2025 15:09

tekknolagi marked this pull request as ready for review July 10, 2025 15:32

tekknolagi requested a review from a team July 10, 2025 15:32

This comment has been minimized.

Sign in to view

k0kubun reviewed Jul 10, 2025

View reviewed changes

k0kubun approved these changes Jul 10, 2025

View reviewed changes

tekknolagi added 3 commits July 11, 2025 11:51

ZJIT: Use Vec, not HashMap, for storing ISEQ profiles

f4ae362

ZJIT: Pre-size all profile entries for number of operands

c00566d

Revert "ZJIT: Pre-size all profile entries for number of operands"

993d5aa

This reverts commit 2324d96.

tekknolagi force-pushed the mb-benchmark-compile branch from 19dda26 to 993d5aa Compare July 11, 2025 15:51

tekknolagi enabled auto-merge (squash) July 11, 2025 15:51

tekknolagi merged commit b0b1712 into ruby:master Jul 11, 2025
84 checks passed

tekknolagi deleted the mb-benchmark-compile branch July 11, 2025 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ZJIT: Use Vec instead of HashMap for profiling #13809

ZJIT: Use Vec instead of HashMap for profiling #13809

tekknolagi commented Jul 7, 2025 •

edited

Loading

Uh oh!

tekknolagi commented Jul 10, 2025

Uh oh!

This comment has been minimized.

k0kubun Jul 10, 2025

Uh oh!

tekknolagi Jul 10, 2025

Uh oh!

k0kubun Jul 10, 2025 •

edited

Loading

Uh oh!

tekknolagi Jul 11, 2025

Uh oh!

k0kubun Jul 11, 2025

Uh oh!

tekknolagi Jul 11, 2025

Uh oh!

k0kubun Jul 11, 2025

Uh oh!

Uh oh!

Uh oh!

ZJIT: Use Vec instead of HashMap for profiling #13809

ZJIT: Use Vec instead of HashMap for profiling #13809

Conversation

tekknolagi commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tekknolagi commented Jul 10, 2025

Uh oh!

This comment has been minimized.

k0kubun Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

tekknolagi Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

k0kubun Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tekknolagi Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

k0kubun Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

tekknolagi Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

k0kubun Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tekknolagi commented Jul 7, 2025 •

edited

Loading

k0kubun Jul 10, 2025 •

edited

Loading