Skip to content

Introduce BOP_CMP for optimized comparisons #6851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 6, 2022

Conversation

composerinteralia
Copy link
Contributor

@composerinteralia composerinteralia commented Dec 2, 2022

This PR introduces a new basic operator, BOP_CMP, so we can quickly check whether <=> has been redefined when doing optimized comparisons.

The PR is in two commits (plus one more to update the deps), which could be reviewed separtely:

Move BOP macros to separate file

This first commit moves ruby_basic_operators and the UNREDEFINED macros out of
vm_core.h and into basic_operators.h so that we can use them in internal/compar.h, as well as more broadly in places where we currently do a method look up via rb_method_basic_definition_p (e.g. object.c, numeric.c, complex.c, enum.c, and a number of other places if we introduce more BOPs).

The most controversial part of this first commit is probably moving redefined_flag out of rb_vm_t. Neither vm_opt_method_def_table or vm_opt_mid_table are part of rb_vm_t either, and I do think redefined_flag fits well with those. But more significantly it seems to result in one fewer instruction. For example:

Before:

(lldb) disassemble -n vm_opt_str_freeze
miniruby`vm_exec_core:
miniruby[0x10028233e] <+14558>: movq   0x11a86b(%rip), %rax      ; ruby_current_vm_ptr
miniruby[0x100282345] <+14565>: testb  $0x4, 0x242c(%rax)

After:

(lldb) disassemble -n vm_opt_str_freeze
ruby`vm_exec_core:
ruby[0x100280ebe] <+14510>: testb  $0x4, 0x120147(%rip)      ; ruby_vm_redefined_flag + 43

Introduce BOP_CMP for optimized comparison

Prior to this commit the OPTIMIZED_CMP macro relied on a method lookup to determine whether <=> was overridden. The result of the lookup was cached, but only for the duration of the specific method that initialized the cmp_opt_data cache structure.

With this method lookup, [x,y].max is slower than doing x > y ? x : y even though there's an optimized instruction for "new array max". (John noticed somebody a proposed micro-optimization based on this fact in mastodon/mastodon#19903.)

a, b = 1, 2
Benchmark.ips do |bm|
  bm.report('conditional') { a > b ? a : b }
  bm.report('method') { [a, b].max }
  bm.compare!
end

Before:

Comparison:
         conditional: 22603733.2 i/s
              method: 19820412.7 i/s - 1.14x  (± 0.00) slower

This commit replaces the method lookup with a new CMP basic op, which gives the examples above equivalent performance.

After:

Comparison:
              method: 24022466.5 i/s
         conditional: 23851094.2 i/s - same-ish: difference falls within error

Relevant benchmarks show an improvement to Array#max and Array#min when not using the optimized newarray_max instruction as well. They are noticeably faster for small arrays with the relevant types, and the same or maybe a touch faster on larger arrays.

Benchmark output comparing master@5958c305 against these changes
  • array_max_int

    |                |compare-ruby|built-ruby|
    |:---------------|-----------:|---------:|
    |ary2.max        |     31.546M|   45.045M|
    |                |           -|     1.43x|
    |ary10.max       |     25.907M|   37.594M|
    |                |           -|     1.45x|
    |ary100.max      |     12.092M|   13.123M|
    |                |           -|     1.09x|
    |ary500.max      |      3.775M|    3.964M|
    |                |           -|     1.05x|
    |ary1000.max     |      2.053M|    2.058M|
    |                |           -|     1.00x|
    |ary2000.max     |    713.012k|  813.339k|
    |                |           -|     1.14x|
    |ary3000.max     |    848.320k|  856.825k|
    |                |           -|     1.01x|
    |ary5000.max     |    403.161k|  412.116k|
    |                |           -|     1.02x|
    |ary10000.max    |    208.381k|  209.780k|
    |                |           -|     1.01x|
    |ary20000.max    |    106.438k|  107.331k|
    |                |           -|     1.01x|
    |ary50000.max    |     42.102k|   41.743k|
    |                |       1.01x|         -|
    |ary100000.max   |     21.131k|   21.069k|
    |                |       1.00x|         -|
    |ary1000000.max  |      2.052k|    2.013k|
    |                |       1.02x|         -|
    
  • array_max_float

    |               |compare-ruby|built-ruby|
    |:--------------|-----------:|---------:|
    |ary2.max       |     26.882M|   34.843M|
    |               |           -|     1.30x|
    |ary10.max      |     10.299M|   12.453M|
    |               |           -|     1.21x|
    |ary100.max     |      1.321M|    1.286M|
    |               |       1.03x|         -|
    |ary500.max     |    269.818k|  290.909k|
    |               |           -|     1.08x|
    |ary1000.max    |    136.947k|  152.849k|
    |               |           -|     1.12x|
    |ary2000.max    |     56.819k|   57.943k|
    |               |           -|     1.02x|
    |ary3000.max    |     56.480k|   60.751k|
    |               |           -|     1.08x|
    |ary5000.max    |     28.669k|   30.332k|
    |               |           -|     1.06x|
    |ary10000.max   |     14.247k|   15.255k|
    |               |           -|     1.07x|
    |ary20000.max   |      7.037k|    7.691k|
    |               |           -|     1.09x|
    |ary50000.max   |      2.841k|    3.076k|
    |               |           -|     1.08x|
    |ary100000.max  |      1.442k|    1.523k|
    |               |           -|     1.06x|
    
  • array_max_str

    |               |compare-ruby|built-ruby|
    |:--------------|-----------:|---------:|
    |ary2.max       |     21.882M|   35.461M|
    |               |           -|     1.62x|
    |ary10.max      |     11.834M|   13.228M|
    |               |           -|     1.12x|
    |ary100.max     |      1.398M|    1.410M|
    |               |           -|     1.01x|
    |ary500.max     |    309.885k|  304.284k|
    |               |       1.02x|         -|
    |ary1000.max    |    138.160k|  138.101k|
    |               |       1.00x|         -|
    |ary2000.max    |     57.870k|   61.428k|
    |               |           -|     1.06x|
    |ary3000.max    |     58.945k|   59.989k|
    |               |           -|     1.02x|
    |ary5000.max    |     30.331k|   31.941k|
    |               |           -|     1.05x|
    |ary10000.max   |     12.037k|   12.395k|
    |               |           -|     1.03x|
    |ary20000.max   |      6.169k|    6.258k|
    |               |           -|     1.01x|
    |ary50000.max   |      2.459k|    2.506k|
    |               |           -|     1.02x|
    |ary100000.max  |      1.023k|    1.035k|
    |               |           -|     1.01x|
    
  • array_min

    |                |compare-ruby|built-ruby|
    |:---------------|-----------:|---------:|
    |ary2.min        |     27.778M|   45.249M|
    |                |           -|     1.63x|
    |ary10.min       |     26.667M|   37.313M|
    |                |           -|     1.40x|
    |ary100.min      |     12.346M|   12.970M|
    |                |           -|     1.05x|
    |ary500.min      |      3.413M|    4.088M|
    |                |           -|     1.20x|
    |ary1000.min     |      2.049M|    2.081M|
    |                |           -|     1.02x|
    |ary2000.min     |    847.027k|  823.113k|
    |                |       1.03x|         -|
    |ary3000.min     |    788.830k|  857.045k|
    |                |           -|     1.09x|
    |ary5000.min     |    400.064k|  413.223k|
    |                |           -|     1.03x|
    |ary10000.min    |    205.225k|  209.661k|
    |                |           -|     1.02x|
    |ary20000.min    |    106.158k|  106.014k|
    |                |       1.00x|         -|
    |ary50000.min    |     41.978k|   41.151k|
    |                |       1.02x|         -|
    |ary100000.min   |     20.452k|   20.793k|
    |                |           -|     1.02x|
    |ary1000000.min  |      1.970k|    2.015k|
    |                |           -|     1.02x|
    
  • array_sort_int

    |               |compare-ruby|built-ruby|
    |:--------------|-----------:|---------:|
    |ary2.sort      |      4.476M|    4.566M|
    |               |           -|     1.02x|
    |ary10.sort     |      1.883M|    1.952M|
    |               |           -|     1.04x|
    |ary100.sort    |    188.740k|  208.212k|
    |               |           -|     1.10x|
    |ary1000.sort   |     11.633k|   12.404k|
    |               |           -|     1.07x|
    |ary10000.sort  |     870.835|   901.142|
    |               |           -|     1.03x|
    
    
  • range_min

    |             |compare-ruby|built-ruby|
    |:------------|-----------:|---------:|
    |(1..10).min  |     35.793M|   45.426M|
    |             |           -|     1.27x|
    
  • enum_minmax

    |                 |compare-ruby|built-ruby|
    |:----------------|-----------:|---------:|
    |set2.min         |      1.966M|    2.267M|
    |                 |           -|     1.15x|
    |set10.min        |      1.340M|    1.479M|
    |                 |           -|     1.10x|
    |set100.min       |    383.083k|  363.135k|
    |                 |       1.05x|         -|
    |set1000.min      |     48.824k|   49.068k|
    |                 |           -|     1.00x|
    |set10000.min     |      5.084k|    5.107k|
    |                 |           -|     1.00x|
    |set2.max         |      1.979M|    2.536M|
    |                 |           -|     1.28x|
    |set10.max        |      1.400M|    1.656M|
    |                 |           -|     1.18x|
    |set100.max       |    381.185k|  368.094k|
    |                 |       1.04x|         -|
    |set1000.max      |     48.235k|   48.075k|
    |                 |       1.00x|         -|
    |set10000.max     |      4.971k|    4.841k|
    |                 |       1.03x|         -|
    |set2.minmax      |      2.278M|    2.352M|
    |                 |           -|     1.03x|
    |set10.minmax     |      1.558M|    1.429M|
    |                 |       1.09x|         -|
    |set100.minmax    |    310.299k|  298.998k|
    |                 |       1.04x|         -|
    |set1000.minmax   |     40.261k|   41.033k|
    |                 |           -|     1.02x|
    |set10000.minmax  |      4.126k|    4.217k|
    |                 |           -|     1.02x|
    
  • enum_sort

    |                  |compare-ruby|built-ruby|
    |:-----------------|-----------:|---------:|
    |set2.sort_by      |      1.100M|    1.188M|
    |                  |           -|     1.08x|
    |set10.sort_by     |    558.534k|  608.643k|
    |                  |           -|     1.09x|
    |set100.sort_by    |    107.417k|  116.046k|
    |                  |           -|     1.08x|
    |set1000.sort_by   |     13.077k|   13.804k|
    |                  |           -|     1.06x|
    |set10000.sort_by  |      1.279k|    1.327k|
    |                  |           -|     1.04x|
    

@matzbot matzbot requested a review from a team December 2, 2022 01:33
@k0kubun
Copy link
Member

k0kubun commented Dec 2, 2022

Could you benchmark it with either the original motivation, Mastodon, or yjit-bench https://github.com/shopify/yjit-bench? Since this is a type of optimization that we can't keep adding (if the bitmap width becomes too large, it might need to be more complicated/slower), it's nice to check non-microbenchmarks as well.

edit: It looks like it would just consume one extra short unlike supporting another core class, so the scalability limit seems not as hard as I initially thought. However, it's still worth checking that it at least doesn't slow down non-microbenchmarks.

@composerinteralia
Copy link
Contributor Author

composerinteralia commented Dec 2, 2022

Thank you! I wasn't familiar with https://github.com/shopify/yjit-bench, so I'm glad to have learned about it.

Here's the output I got:

master: ruby 3.2.0dev (2022-12-02T01:05:22Z master 8abf9e6ad0) [x86_64-darwin21]
cmp-bop: ruby 3.2.0dev (2022-12-02T02:11:24Z cmp-bop 40f0be0e6a) [x86_64-darwin21]

-------------  -----------  ----------  ------------  ----------  --------------  ---------------
bench          master (ms)  stddev (%)  cmp-bop (ms)  stddev (%)  master/cmp-bop  cmp-bop 1st itr
30k_ifelse     3444.2       3.6         3390.4        1.7         1.02            0.75           
30k_methods    7519.0       1.6         7948.7        8.1         0.95            1.00           
activerecord   279.6        2.0         281.3         1.9         0.99            0.97           
binarytrees    902.7        4.1         850.4         2.1         1.06            1.00           
cfunc_itself   273.8        4.2         259.5         2.0         1.05            1.03           
chunky_png     2943.0       0.2         2857.6        0.4         1.03            1.06           
erubi          450.5        4.7         444.4         2.2         1.01            0.96           
erubi_rails    46.4         4.8         46.1          8.4         1.01            0.97           
etanni         602.2        1.5         548.1         1.2         1.10            1.08           
fannkuchredux  14094.4      7.7         13393.4       0.4         1.05            1.05           
fib            649.5        0.4         564.5         0.3         1.15            0.97           
getivar        330.0        1.9         304.4         0.9         1.08            1.00           
hexapdf        7177.7       1.2         5713.5        1.5         1.26            1.26           
keyword_args   580.4        2.3         511.7         1.5         1.13            1.16           
lee            2482.8       5.5         2438.2        1.0         1.02            1.18           
liquid-render  384.4        11.6        373.4         1.5         1.03            0.96           
mail           239.3        2.5         262.2         1.0         0.91            1.05           
nbody          260.2        5.5         267.1         1.2         0.97            0.93           
optcarrot      16054.5      2.4         15287.1       5.2         1.05            0.99           
psych-load     4896.8       0.3         4499.3        3.6         1.09            1.10           
railsbench     4045.6       4.7         3813.8        3.2         1.06            1.15           
respond_to     481.3        3.7         464.8         6.7         1.04            1.01           
ruby-lsp       419.0        3.1         384.9         5.2         1.09            0.89           
rubykon        24634.5      0.6         23064.7       3.3         1.07            0.96           
setivar        293.7        1.3         302.8         4.3         0.97            0.96           
str_concat     155.3        3.7         167.3         8.0         0.93            1.01           
-------------  -----------  ----------  ------------  ----------  --------------  ---------------
Legend:
- master/cmp-bop: ratio of master/cmp-bop time. Higher is better for cmp-bop. Above 1 represents a speedup.
- cmp-bop 1st itr: ratio of master/cmp-bop time for the first benchmarking iteration.

That looks fairly same-ish to me. Do you agree? I also reran a couple individual benchmarks to see how much the results would vary:

master: ruby 3.2.0dev (2022-12-02T01:05:22Z master 8abf9e6ad0) [x86_64-darwin21]                  
cmp-bop: ruby 3.2.0dev (2022-12-02T02:11:24Z cmp-bop 40f0be0e6a) [x86_64-darwin21]                
                        
----------  -----------  ----------  ------------  ----------  --------------  ---------------    
bench       master (ms)  stddev (%)  cmp-bop (ms)  stddev (%)  master/cmp-bop  cmp-bop 1st itr     
30k_ifelse  2609.7       1.3         2548.7        1.7         1.02            1.3
hexapdf     5741.8       1.8         5350.2        2.1         1.07            0.99     
mail        241.1        2.1         240.9         1.5         1.00            0.99          
----------  -----------  ----------  ------------  ----------  --------------  ---------------

@k0kubun
Copy link
Member

k0kubun commented Dec 2, 2022

Yeah, at least it doesn't look bad 👍

@composerinteralia
Copy link
Contributor Author

It looks like some tests are failing with Error: The action has timed out.. I'm not quite sure if there's anything for me to do there—is it common for some tests to time out, or have I done something wrong that's causing them to time out?

@k0kubun
Copy link
Member

k0kubun commented Dec 2, 2022

It was a fault of old master. Retrying them might fix it, but I've seen GitHub Actions tests a PR against old master even after retries, so it might be the best to rebase your branch against latest master.

composerinteralia and others added 2 commits December 2, 2022 14:37
This commit moves ruby_basic_operators and the unredefined macros out of
vm_core.h and into basic_operators.h so that we can use them more
broadly in places where we currently use a method look up via
`rb_method_basic_definition_p` (e.g. object.c, numeric.c, complex.c,
enum.c, but also in internal/compar.h after introducing BOP_CMP and
elsewhere if we introduce more BOPs)

The most controversial part of this change is probably moving
redefined_flag out of rb_vm_t. [vm_opt_method_def_table and
vm_opt_mid_table](https://github.com/ruby/ruby/blob/9da2a5204f32a4f2ce135fddde2abb6e07d647e9/vm.c)
are not part of rb_vm_t either, and I think this fits well with those.
But more significantly it seems to result in one fewer instruction. For
example:

Before:

```
(lldb) disassemble -n vm_opt_str_freeze
miniruby`vm_exec_core:
miniruby[0x10028233e] <+14558>: movq   0x11a86b(%rip), %rax      ; ruby_current_vm_ptr
miniruby[0x100282345] <+14565>: testb  $0x4, 0x242c(%rax)
```

After:

```
(lldb) disassemble -n vm_opt_str_freeze
ruby`vm_exec_core:
ruby[0x100280ebe] <+14510>: testb  $0x4, 0x120147(%rip)      ; ruby_vm_redefined_flag + 43
```

Co-authored-by: John Hawthorn <jhawthorn@github.com>
Prior to this commit the `OPTIMIZED_CMP` macro relied on a method lookup
to determine whether `<=>` was overridden. The result of the lookup was
cached, but only for the duration of the specific method that
initialized the cmp_opt_data cache structure.

With this method lookup, `[x,y].max` is slower than doing `x > y ?
x : y` even though there's an optimized instruction for "new array max".
(John noticed somebody a proposed micro-optimization based on this fact
in mastodon/mastodon#19903.)

```rb
a, b = 1, 2
Benchmark.ips do |bm|
  bm.report('conditional') { a > b ? a : b }
  bm.report('method') { [a, b].max }
  bm.compare!
end
```

Before:

```
Comparison:
         conditional: 22603733.2 i/s
              method: 19820412.7 i/s - 1.14x  (± 0.00) slower
```

This commit replaces the method lookup with a new CMP basic op, which
gives the examples above equivalent performance.

After:

```
Comparison:
              method: 24022466.5 i/s
         conditional: 23851094.2 i/s - same-ish: difference falls within
error
```

Relevant benchmarks show an improvement to Array#max and Array#min when
not using the optimized newarray_max instruction as well. They are
noticeably faster for small arrays with the relevant types, and the same
or maybe a touch faster on larger arrays.

```
$ make benchmark COMPARE_RUBY=<master@5958c30> ITEM=array_min
$ make benchmark COMPARE_RUBY=<master@5958c30> ITEM=array_max
```

The benchmarks added in this commit also look generally improved.

Co-authored-by: John Hawthorn <jhawthorn@github.com>
@k0kubun
Copy link
Member

k0kubun commented Dec 10, 2022

@composerinteralia @jhawthorn The first commit of this PR c43951e seems to have slowed down MJIT on optcarrot and nbody. Here's the comparison of c43951e and its parent 9d4483f:

before: ruby 3.2.0dev (2022-12-06T14:27:15Z master 9d4483f24d) +MJIT [x86_64-linux]
after: ruby 3.2.0dev (2022-12-06T20:37:23Z master c43951e60e) +MJIT [x86_64-linux]

---------  -----------  ----------  ----------  ----------  ------------  -------------
bench      before (ms)  stddev (%)  after (ms)  stddev (%)  before/after  after 1st itr
nbody      38.3         19.4        52.9        2.5         0.72          0.97
optcarrot  1399.2       0.7         1494.7      0.7         0.94          1.00
---------  -----------  ----------  ----------  ----------  ------------  -------------

Do you have any idea to make it as fast as before? This made MJIT slower than YJIT on Optcarrot. (mission accomplished?)

@k0kubun
Copy link
Member

k0kubun commented Dec 10, 2022

Out of curiosity, I benchmarked that commit with the interpreter as well. The difference is small, but I can reliably reproduce the interpreter slowdown on nbody:

before: ruby 3.2.0dev (2022-12-06T14:27:15Z master 9d4483f24d) [x86_64-linux]
after: ruby 3.2.0dev (2022-12-06T20:37:23Z master c43951e60e) [x86_64-linux]

-----  -----------  ----------  ----------  ----------  ------------  -------------
bench  before (ms)  stddev (%)  after (ms)  stddev (%)  before/after  after 1st itr
nbody  67.4         0.4         68.7        0.4         0.98          0.99
-----  -----------  ----------  ----------  ----------  ------------  -------------
Logs
$ ./run_benchmarks.rb nbody -e "before::/opt/rubies/before/bin/ruby" -e "after::/opt/rubies/after/bin/ruby"
Running benchmark "nbody" (1/1)
setarch x86_64 -R taskset -c 7 /opt/rubies/before/bin/ruby -I ./harness benchmarks/nbody/benchmark.rb
ruby 3.2.0dev (2022-12-06T14:27:15Z master 9d4483f24d) [x86_64-linux]
itr #1: 67ms
itr #2: 67ms
itr #3: 67ms
itr #4: 67ms
itr #5: 67ms
itr #6: 67ms
itr #7: 67ms
itr #8: 67ms
itr #9: 67ms
itr #10: 67ms
itr #11: 67ms
itr #12: 67ms
itr #13: 67ms
itr #14: 67ms
itr #15: 67ms
itr #16: 67ms
itr #17: 67ms
itr #18: 67ms
itr #19: 67ms
itr #20: 67ms
itr #21: 67ms
itr #22: 67ms
itr #23: 67ms
itr #24: 67ms
itr #25: 67ms
itr #26: 67ms
itr #27: 67ms
itr #28: 67ms
itr #29: 67ms
itr #30: 67ms
itr #31: 67ms
itr #32: 67ms
itr #33: 67ms
itr #34: 67ms
itr #35: 67ms
itr #36: 67ms
itr #37: 67ms
itr #38: 67ms
itr #39: 67ms
itr #40: 67ms
itr #41: 67ms
itr #42: 67ms
itr #43: 67ms
itr #44: 67ms
itr #45: 67ms
itr #46: 67ms
itr #47: 67ms
itr #48: 67ms
itr #49: 67ms
itr #50: 67ms
itr #51: 67ms
itr #52: 67ms
itr #53: 67ms
itr #54: 67ms
itr #55: 67ms
itr #56: 67ms
itr #57: 67ms
itr #58: 67ms
itr #59: 68ms
itr #60: 68ms
itr #61: 68ms
itr #62: 67ms
itr #63: 67ms
itr #64: 67ms
itr #65: 67ms
itr #66: 67ms
itr #67: 67ms
itr #68: 67ms
itr #69: 67ms
itr #70: 67ms
itr #71: 67ms
itr #72: 67ms
itr #73: 67ms
itr #74: 67ms
itr #75: 67ms
itr #76: 67ms
itr #77: 67ms
itr #78: 67ms
itr #79: 67ms
itr #80: 67ms
itr #81: 67ms
itr #82: 67ms
itr #83: 67ms
itr #84: 67ms
itr #85: 67ms
itr #86: 67ms
itr #87: 67ms
itr #88: 67ms
itr #89: 67ms
itr #90: 67ms
itr #91: 67ms
itr #92: 67ms
itr #93: 67ms
itr #94: 67ms
itr #95: 67ms
itr #96: 67ms
itr #97: 67ms
itr #98: 67ms
itr #99: 67ms
itr #100: 67ms
itr #101: 67ms
itr #102: 67ms
itr #103: 67ms
itr #104: 67ms
itr #105: 67ms
itr #106: 67ms
itr #107: 67ms
itr #108: 67ms
itr #109: 67ms
itr #110: 67ms
itr #111: 67ms
itr #112: 67ms
itr #113: 67ms
itr #114: 67ms
itr #115: 67ms
itr #116: 67ms
itr #117: 67ms
itr #118: 68ms
itr #119: 68ms
itr #120: 68ms
itr #121: 67ms
itr #122: 67ms
itr #123: 67ms
itr #124: 67ms
itr #125: 67ms
itr #126: 67ms
itr #127: 67ms
itr #128: 67ms
itr #129: 67ms
itr #130: 67ms
itr #131: 67ms
itr #132: 67ms
itr #133: 67ms
itr #134: 67ms
itr #135: 67ms
itr #136: 67ms
itr #137: 67ms
itr #138: 67ms
itr #139: 67ms
itr #140: 67ms
itr #141: 67ms
itr #142: 67ms
itr #143: 67ms
itr #144: 67ms
itr #145: 67ms
itr #146: 67ms
itr #147: 67ms
itr #148: 67ms
itr #149: 67ms
Average of last 134, non-warmup iters: 67ms
Running benchmark "nbody" (1/1)
setarch x86_64 -R taskset -c 7 /opt/rubies/after/bin/ruby -I ./harness benchmarks/nbody/benchmark.rb
ruby 3.2.0dev (2022-12-06T20:37:23Z master c43951e60e) [x86_64-linux]
itr #1: 68ms
itr #2: 68ms
itr #3: 68ms
itr #4: 68ms
itr #5: 68ms
itr #6: 68ms
itr #7: 68ms
itr #8: 68ms
itr #9: 68ms
itr #10: 68ms
itr #11: 68ms
itr #12: 68ms
itr #13: 68ms
itr #14: 68ms
itr #15: 68ms
itr #16: 69ms
itr #17: 68ms
itr #18: 68ms
itr #19: 68ms
itr #20: 68ms
itr #21: 68ms
itr #22: 68ms
itr #23: 68ms
itr #24: 68ms
itr #25: 68ms
itr #26: 68ms
itr #27: 69ms
itr #28: 69ms
itr #29: 68ms
itr #30: 68ms
itr #31: 68ms
itr #32: 68ms
itr #33: 68ms
itr #34: 68ms
itr #35: 68ms
itr #36: 68ms
itr #37: 68ms
itr #38: 68ms
itr #39: 68ms
itr #40: 68ms
itr #41: 68ms
itr #42: 68ms
itr #43: 68ms
itr #44: 68ms
itr #45: 68ms
itr #46: 68ms
itr #47: 68ms
itr #48: 68ms
itr #49: 68ms
itr #50: 68ms
itr #51: 68ms
itr #52: 68ms
itr #53: 68ms
itr #54: 68ms
itr #55: 68ms
itr #56: 68ms
itr #57: 68ms
itr #58: 68ms
itr #59: 68ms
itr #60: 68ms
itr #61: 68ms
itr #62: 68ms
itr #63: 68ms
itr #64: 68ms
itr #65: 68ms
itr #66: 68ms
itr #67: 68ms
itr #68: 68ms
itr #69: 68ms
itr #70: 68ms
itr #71: 68ms
itr #72: 68ms
itr #73: 68ms
itr #74: 68ms
itr #75: 68ms
itr #76: 68ms
itr #77: 68ms
itr #78: 68ms
itr #79: 68ms
itr #80: 68ms
itr #81: 68ms
itr #82: 68ms
itr #83: 68ms
itr #84: 68ms
itr #85: 69ms
itr #86: 69ms
itr #87: 69ms
itr #88: 68ms
itr #89: 68ms
itr #90: 68ms
itr #91: 68ms
itr #92: 68ms
itr #93: 68ms
itr #94: 68ms
itr #95: 68ms
itr #96: 68ms
itr #97: 68ms
itr #98: 68ms
itr #99: 68ms
itr #100: 68ms
itr #101: 68ms
itr #102: 68ms
itr #103: 68ms
itr #104: 68ms
itr #105: 68ms
itr #106: 68ms
itr #107: 68ms
itr #108: 68ms
itr #109: 68ms
itr #110: 68ms
itr #111: 68ms
itr #112: 68ms
itr #113: 68ms
itr #114: 68ms
itr #115: 68ms
itr #116: 68ms
itr #117: 68ms
itr #118: 68ms
itr #119: 68ms
itr #120: 68ms
itr #121: 68ms
itr #122: 68ms
itr #123: 68ms
itr #124: 68ms
itr #125: 68ms
itr #126: 68ms
itr #127: 68ms
itr #128: 68ms
itr #129: 68ms
itr #130: 68ms
itr #131: 68ms
itr #132: 68ms
itr #133: 68ms
itr #134: 68ms
itr #135: 68ms
itr #136: 68ms
itr #137: 68ms
itr #138: 68ms
itr #139: 68ms
itr #140: 68ms
itr #141: 69ms
itr #142: 68ms
itr #143: 69ms
itr #144: 69ms
itr #145: 69ms
itr #146: 68ms
Average of last 131, non-warmup iters: 68ms
Total time spent benchmarking: 20s

before: ruby 3.2.0dev (2022-12-06T14:27:15Z master 9d4483f24d) [x86_64-linux]
after: ruby 3.2.0dev (2022-12-06T20:37:23Z master c43951e60e) [x86_64-linux]

-----  -----------  ----------  ----------  ----------  ------------  -------------
bench  before (ms)  stddev (%)  after (ms)  stddev (%)  before/after  after 1st itr
nbody  67.4         0.4         68.7        0.4         0.98          0.99
-----  -----------  ----------  ----------  ----------  ------------  -------------
Legend:
- before/after: ratio of before/after time. Higher is better for after. Above 1 represents a speedup.
- after 1st itr: ratio of before/after time for the first benchmarking iteration.

In the Logs, you can see it used to be consistently 67ms (with a few 68ms itrs) whereas the first commit of this PR makes it 68ms (with a few 69ms itrs). I can kind of see such a slightly worse trend on the blue line of nbody on rubybench from that revision too.

@jhawthorn
Copy link
Member

jhawthorn commented Dec 10, 2022

@k0kubun thanks for spotting. Unfortunately I don't seem to be able to reproduce. On my machine (a couple years old zen2) I'm seeing nbody slightly faster on the interpreter with the change (seems consistent, though there is definitely a difference in how much between executions), and no difference to MJIT.

./run_benchmarks.rb nbody -e "before::/home/jhawthorn/.rubies/ruby-trunk/bin/ruby" -e "after::/home/jhawthorn/src/ruby/ruby"
...
Average of last 91, non-warmup iters: 94ms
Total time spent benchmarking: 20s

before: ruby 3.2.0dev (2022-12-06T14:27:15Z cmp-bop 9d4483f24d) [x86_64-linux]
after: ruby 3.2.0dev (2022-12-06T20:37:23Z cmp-bop c43951e60e) [x86_64-linux]

-----  -----------  ----------  ----------  ----------  ------------  -------------
bench  before (ms)  stddev (%)  after (ms)  stddev (%)  before/after  after 1st itr
nbody  99.4         0.8         94.7        0.9         1.05          1.05
-----  -----------  ----------  ----------  ----------  ------------  -------------
Legend:
- before/after: ratio of before/after time. Higher is better for after. Above 1 represents a speedup.
- after 1st itr: ratio of before/after time for the first benchmarking iteration.

What machine are you on? Your benchmark does look like it's running run a fair bit faster than mine 😓. What arguments are you giving to yjit-bench for MJIT?

@k0kubun
Copy link
Member

k0kubun commented Dec 10, 2022

Just --mjit. My local machine uses Ryzen 7 5800X, and the benckmark server is https://rubybench.github.io/hardware.html. Both on Ubuntu 22.04, GCC. I could give you ssh access to the benchmark server if you want to try it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants