Introduce BOP_CMP for optimized comparisons #6851

composerinteralia · 2022-12-02T01:32:53Z

This PR introduces a new basic operator, BOP_CMP, so we can quickly check whether <=> has been redefined when doing optimized comparisons.

The PR is in two commits (plus one more to update the deps), which could be reviewed separtely:

Move BOP macros to separate file

This first commit moves ruby_basic_operators and the UNREDEFINED macros out of
vm_core.h and into basic_operators.h so that we can use them in internal/compar.h, as well as more broadly in places where we currently do a method look up via rb_method_basic_definition_p (e.g. object.c, numeric.c, complex.c, enum.c, and a number of other places if we introduce more BOPs).

The most controversial part of this first commit is probably moving redefined_flag out of rb_vm_t. Neither vm_opt_method_def_table or vm_opt_mid_table are part of rb_vm_t either, and I do think redefined_flag fits well with those. But more significantly it seems to result in one fewer instruction. For example:

Before:

(lldb) disassemble -n vm_opt_str_freeze
miniruby`vm_exec_core:
miniruby[0x10028233e] <+14558>: movq   0x11a86b(%rip), %rax      ; ruby_current_vm_ptr
miniruby[0x100282345] <+14565>: testb  $0x4, 0x242c(%rax)

After:

(lldb) disassemble -n vm_opt_str_freeze
ruby`vm_exec_core:
ruby[0x100280ebe] <+14510>: testb  $0x4, 0x120147(%rip)      ; ruby_vm_redefined_flag + 43

Introduce BOP_CMP for optimized comparison

Prior to this commit the OPTIMIZED_CMP macro relied on a method lookup to determine whether <=> was overridden. The result of the lookup was cached, but only for the duration of the specific method that initialized the cmp_opt_data cache structure.

With this method lookup, [x,y].max is slower than doing x > y ? x : y even though there's an optimized instruction for "new array max". (John noticed somebody a proposed micro-optimization based on this fact in mastodon/mastodon#19903.)

a, b = 1, 2
Benchmark.ips do |bm|
  bm.report('conditional') { a > b ? a : b }
  bm.report('method') { [a, b].max }
  bm.compare!
end

Before:

Comparison:
         conditional: 22603733.2 i/s
              method: 19820412.7 i/s - 1.14x  (± 0.00) slower

This commit replaces the method lookup with a new CMP basic op, which gives the examples above equivalent performance.

After:

Comparison:
              method: 24022466.5 i/s
         conditional: 23851094.2 i/s - same-ish: difference falls within error

Relevant benchmarks show an improvement to Array#max and Array#min when not using the optimized newarray_max instruction as well. They are noticeably faster for small arrays with the relevant types, and the same or maybe a touch faster on larger arrays.

Benchmark output comparing master@5958c305 against these changes

array_max_int

|                |compare-ruby|built-ruby|
|:---------------|-----------:|---------:|
|ary2.max        |     31.546M|   45.045M|
|                |           -|     1.43x|
|ary10.max       |     25.907M|   37.594M|
|                |           -|     1.45x|
|ary100.max      |     12.092M|   13.123M|
|                |           -|     1.09x|
|ary500.max      |      3.775M|    3.964M|
|                |           -|     1.05x|
|ary1000.max     |      2.053M|    2.058M|
|                |           -|     1.00x|
|ary2000.max     |    713.012k|  813.339k|
|                |           -|     1.14x|
|ary3000.max     |    848.320k|  856.825k|
|                |           -|     1.01x|
|ary5000.max     |    403.161k|  412.116k|
|                |           -|     1.02x|
|ary10000.max    |    208.381k|  209.780k|
|                |           -|     1.01x|
|ary20000.max    |    106.438k|  107.331k|
|                |           -|     1.01x|
|ary50000.max    |     42.102k|   41.743k|
|                |       1.01x|         -|
|ary100000.max   |     21.131k|   21.069k|
|                |       1.00x|         -|
|ary1000000.max  |      2.052k|    2.013k|
|                |       1.02x|         -|

array_max_float

|               |compare-ruby|built-ruby|
|:--------------|-----------:|---------:|
|ary2.max       |     26.882M|   34.843M|
|               |           -|     1.30x|
|ary10.max      |     10.299M|   12.453M|
|               |           -|     1.21x|
|ary100.max     |      1.321M|    1.286M|
|               |       1.03x|         -|
|ary500.max     |    269.818k|  290.909k|
|               |           -|     1.08x|
|ary1000.max    |    136.947k|  152.849k|
|               |           -|     1.12x|
|ary2000.max    |     56.819k|   57.943k|
|               |           -|     1.02x|
|ary3000.max    |     56.480k|   60.751k|
|               |           -|     1.08x|
|ary5000.max    |     28.669k|   30.332k|
|               |           -|     1.06x|
|ary10000.max   |     14.247k|   15.255k|
|               |           -|     1.07x|
|ary20000.max   |      7.037k|    7.691k|
|               |           -|     1.09x|
|ary50000.max   |      2.841k|    3.076k|
|               |           -|     1.08x|
|ary100000.max  |      1.442k|    1.523k|
|               |           -|     1.06x|

array_max_str

|               |compare-ruby|built-ruby|
|:--------------|-----------:|---------:|
|ary2.max       |     21.882M|   35.461M|
|               |           -|     1.62x|
|ary10.max      |     11.834M|   13.228M|
|               |           -|     1.12x|
|ary100.max     |      1.398M|    1.410M|
|               |           -|     1.01x|
|ary500.max     |    309.885k|  304.284k|
|               |       1.02x|         -|
|ary1000.max    |    138.160k|  138.101k|
|               |       1.00x|         -|
|ary2000.max    |     57.870k|   61.428k|
|               |           -|     1.06x|
|ary3000.max    |     58.945k|   59.989k|
|               |           -|     1.02x|
|ary5000.max    |     30.331k|   31.941k|
|               |           -|     1.05x|
|ary10000.max   |     12.037k|   12.395k|
|               |           -|     1.03x|
|ary20000.max   |      6.169k|    6.258k|
|               |           -|     1.01x|
|ary50000.max   |      2.459k|    2.506k|
|               |           -|     1.02x|
|ary100000.max  |      1.023k|    1.035k|
|               |           -|     1.01x|

array_min

|                |compare-ruby|built-ruby|
|:---------------|-----------:|---------:|
|ary2.min        |     27.778M|   45.249M|
|                |           -|     1.63x|
|ary10.min       |     26.667M|   37.313M|
|                |           -|     1.40x|
|ary100.min      |     12.346M|   12.970M|
|                |           -|     1.05x|
|ary500.min      |      3.413M|    4.088M|
|                |           -|     1.20x|
|ary1000.min     |      2.049M|    2.081M|
|                |           -|     1.02x|
|ary2000.min     |    847.027k|  823.113k|
|                |       1.03x|         -|
|ary3000.min     |    788.830k|  857.045k|
|                |           -|     1.09x|
|ary5000.min     |    400.064k|  413.223k|
|                |           -|     1.03x|
|ary10000.min    |    205.225k|  209.661k|
|                |           -|     1.02x|
|ary20000.min    |    106.158k|  106.014k|
|                |       1.00x|         -|
|ary50000.min    |     41.978k|   41.151k|
|                |       1.02x|         -|
|ary100000.min   |     20.452k|   20.793k|
|                |           -|     1.02x|
|ary1000000.min  |      1.970k|    2.015k|
|                |           -|     1.02x|

array_sort_int

|               |compare-ruby|built-ruby|
|:--------------|-----------:|---------:|
|ary2.sort      |      4.476M|    4.566M|
|               |           -|     1.02x|
|ary10.sort     |      1.883M|    1.952M|
|               |           -|     1.04x|
|ary100.sort    |    188.740k|  208.212k|
|               |           -|     1.10x|
|ary1000.sort   |     11.633k|   12.404k|
|               |           -|     1.07x|
|ary10000.sort  |     870.835|   901.142|
|               |           -|     1.03x|

range_min

|             |compare-ruby|built-ruby|
|:------------|-----------:|---------:|
|(1..10).min  |     35.793M|   45.426M|
|             |           -|     1.27x|

enum_minmax

|                 |compare-ruby|built-ruby|
|:----------------|-----------:|---------:|
|set2.min         |      1.966M|    2.267M|
|                 |           -|     1.15x|
|set10.min        |      1.340M|    1.479M|
|                 |           -|     1.10x|
|set100.min       |    383.083k|  363.135k|
|                 |       1.05x|         -|
|set1000.min      |     48.824k|   49.068k|
|                 |           -|     1.00x|
|set10000.min     |      5.084k|    5.107k|
|                 |           -|     1.00x|
|set2.max         |      1.979M|    2.536M|
|                 |           -|     1.28x|
|set10.max        |      1.400M|    1.656M|
|                 |           -|     1.18x|
|set100.max       |    381.185k|  368.094k|
|                 |       1.04x|         -|
|set1000.max      |     48.235k|   48.075k|
|                 |       1.00x|         -|
|set10000.max     |      4.971k|    4.841k|
|                 |       1.03x|         -|
|set2.minmax      |      2.278M|    2.352M|
|                 |           -|     1.03x|
|set10.minmax     |      1.558M|    1.429M|
|                 |       1.09x|         -|
|set100.minmax    |    310.299k|  298.998k|
|                 |       1.04x|         -|
|set1000.minmax   |     40.261k|   41.033k|
|                 |           -|     1.02x|
|set10000.minmax  |      4.126k|    4.217k|
|                 |           -|     1.02x|

enum_sort

|                  |compare-ruby|built-ruby|
|:-----------------|-----------:|---------:|
|set2.sort_by      |      1.100M|    1.188M|
|                  |           -|     1.08x|
|set10.sort_by     |    558.534k|  608.643k|
|                  |           -|     1.09x|
|set100.sort_by    |    107.417k|  116.046k|
|                  |           -|     1.08x|
|set1000.sort_by   |     13.077k|   13.804k|
|                  |           -|     1.06x|
|set10000.sort_by  |      1.279k|    1.327k|
|                  |           -|     1.04x|

k0kubun · 2022-12-02T01:52:21Z

Could you benchmark it with either the original motivation, Mastodon, or yjit-bench https://github.com/shopify/yjit-bench? ~~Since this is a type of optimization that we can't keep adding (if the bitmap width becomes too large, it might need to be more complicated/slower), it's nice to check non-microbenchmarks as well.~~

edit: It looks like it would just consume one extra short unlike supporting another core class, so the scalability limit seems not as hard as I initially thought. However, it's still worth checking that it at least doesn't slow down non-microbenchmarks.

composerinteralia · 2022-12-02T18:56:28Z

Thank you! I wasn't familiar with https://github.com/shopify/yjit-bench, so I'm glad to have learned about it.

Here's the output I got:

master: ruby 3.2.0dev (2022-12-02T01:05:22Z master 8abf9e6ad0) [x86_64-darwin21]
cmp-bop: ruby 3.2.0dev (2022-12-02T02:11:24Z cmp-bop 40f0be0e6a) [x86_64-darwin21]

-------------  -----------  ----------  ------------  ----------  --------------  ---------------
bench          master (ms)  stddev (%)  cmp-bop (ms)  stddev (%)  master/cmp-bop  cmp-bop 1st itr
30k_ifelse     3444.2       3.6         3390.4        1.7         1.02            0.75           
30k_methods    7519.0       1.6         7948.7        8.1         0.95            1.00           
activerecord   279.6        2.0         281.3         1.9         0.99            0.97           
binarytrees    902.7        4.1         850.4         2.1         1.06            1.00           
cfunc_itself   273.8        4.2         259.5         2.0         1.05            1.03           
chunky_png     2943.0       0.2         2857.6        0.4         1.03            1.06           
erubi          450.5        4.7         444.4         2.2         1.01            0.96           
erubi_rails    46.4         4.8         46.1          8.4         1.01            0.97           
etanni         602.2        1.5         548.1         1.2         1.10            1.08           
fannkuchredux  14094.4      7.7         13393.4       0.4         1.05            1.05           
fib            649.5        0.4         564.5         0.3         1.15            0.97           
getivar        330.0        1.9         304.4         0.9         1.08            1.00           
hexapdf        7177.7       1.2         5713.5        1.5         1.26            1.26           
keyword_args   580.4        2.3         511.7         1.5         1.13            1.16           
lee            2482.8       5.5         2438.2        1.0         1.02            1.18           
liquid-render  384.4        11.6        373.4         1.5         1.03            0.96           
mail           239.3        2.5         262.2         1.0         0.91            1.05           
nbody          260.2        5.5         267.1         1.2         0.97            0.93           
optcarrot      16054.5      2.4         15287.1       5.2         1.05            0.99           
psych-load     4896.8       0.3         4499.3        3.6         1.09            1.10           
railsbench     4045.6       4.7         3813.8        3.2         1.06            1.15           
respond_to     481.3        3.7         464.8         6.7         1.04            1.01           
ruby-lsp       419.0        3.1         384.9         5.2         1.09            0.89           
rubykon        24634.5      0.6         23064.7       3.3         1.07            0.96           
setivar        293.7        1.3         302.8         4.3         0.97            0.96           
str_concat     155.3        3.7         167.3         8.0         0.93            1.01           
-------------  -----------  ----------  ------------  ----------  --------------  ---------------
Legend:
- master/cmp-bop: ratio of master/cmp-bop time. Higher is better for cmp-bop. Above 1 represents a speedup.
- cmp-bop 1st itr: ratio of master/cmp-bop time for the first benchmarking iteration.

That looks fairly same-ish to me. Do you agree? I also reran a couple individual benchmarks to see how much the results would vary:

master: ruby 3.2.0dev (2022-12-02T01:05:22Z master 8abf9e6ad0) [x86_64-darwin21]                  
cmp-bop: ruby 3.2.0dev (2022-12-02T02:11:24Z cmp-bop 40f0be0e6a) [x86_64-darwin21]                
                        
----------  -----------  ----------  ------------  ----------  --------------  ---------------    
bench       master (ms)  stddev (%)  cmp-bop (ms)  stddev (%)  master/cmp-bop  cmp-bop 1st itr     
30k_ifelse  2609.7       1.3         2548.7        1.7         1.02            1.3
hexapdf     5741.8       1.8         5350.2        2.1         1.07            0.99     
mail        241.1        2.1         240.9         1.5         1.00            0.99          
----------  -----------  ----------  ------------  ----------  --------------  ---------------

k0kubun · 2022-12-02T19:01:41Z

Yeah, at least it doesn't look bad 👍

composerinteralia · 2022-12-02T19:17:18Z

It looks like some tests are failing with Error: The action has timed out.. I'm not quite sure if there's anything for me to do there—is it common for some tests to time out, or have I done something wrong that's causing them to time out?

k0kubun · 2022-12-02T19:28:51Z

It was a fault of old master. Retrying them might fix it, but I've seen GitHub Actions tests a PR against old master even after retries, so it might be the best to rebase your branch against latest master.

This commit moves ruby_basic_operators and the unredefined macros out of vm_core.h and into basic_operators.h so that we can use them more broadly in places where we currently use a method look up via `rb_method_basic_definition_p` (e.g. object.c, numeric.c, complex.c, enum.c, but also in internal/compar.h after introducing BOP_CMP and elsewhere if we introduce more BOPs) The most controversial part of this change is probably moving redefined_flag out of rb_vm_t. [vm_opt_method_def_table and vm_opt_mid_table](https://github.com/ruby/ruby/blob/9da2a5204f32a4f2ce135fddde2abb6e07d647e9/vm.c) are not part of rb_vm_t either, and I think this fits well with those. But more significantly it seems to result in one fewer instruction. For example: Before: ``` (lldb) disassemble -n vm_opt_str_freeze miniruby`vm_exec_core: miniruby[0x10028233e] <+14558>: movq 0x11a86b(%rip), %rax ; ruby_current_vm_ptr miniruby[0x100282345] <+14565>: testb $0x4, 0x242c(%rax) ``` After: ``` (lldb) disassemble -n vm_opt_str_freeze ruby`vm_exec_core: ruby[0x100280ebe] <+14510>: testb $0x4, 0x120147(%rip) ; ruby_vm_redefined_flag + 43 ``` Co-authored-by: John Hawthorn <jhawthorn@github.com>

Prior to this commit the `OPTIMIZED_CMP` macro relied on a method lookup to determine whether `<=>` was overridden. The result of the lookup was cached, but only for the duration of the specific method that initialized the cmp_opt_data cache structure. With this method lookup, `[x,y].max` is slower than doing `x > y ? x : y` even though there's an optimized instruction for "new array max". (John noticed somebody a proposed micro-optimization based on this fact in mastodon/mastodon#19903.) ```rb a, b = 1, 2 Benchmark.ips do |bm| bm.report('conditional') { a > b ? a : b } bm.report('method') { [a, b].max } bm.compare! end ``` Before: ``` Comparison: conditional: 22603733.2 i/s method: 19820412.7 i/s - 1.14x (± 0.00) slower ``` This commit replaces the method lookup with a new CMP basic op, which gives the examples above equivalent performance. After: ``` Comparison: method: 24022466.5 i/s conditional: 23851094.2 i/s - same-ish: difference falls within error ``` Relevant benchmarks show an improvement to Array#max and Array#min when not using the optimized newarray_max instruction as well. They are noticeably faster for small arrays with the relevant types, and the same or maybe a touch faster on larger arrays. ``` $ make benchmark COMPARE_RUBY=<master@5958c30> ITEM=array_min $ make benchmark COMPARE_RUBY=<master@5958c30> ITEM=array_max ``` The benchmarks added in this commit also look generally improved. Co-authored-by: John Hawthorn <jhawthorn@github.com>

k0kubun · 2022-12-10T06:06:41Z

@composerinteralia @jhawthorn The first commit of this PR c43951e seems to have slowed down MJIT on optcarrot and nbody. Here's the comparison of c43951e and its parent 9d4483f:

before: ruby 3.2.0dev (2022-12-06T14:27:15Z master 9d4483f24d) +MJIT [x86_64-linux]
after: ruby 3.2.0dev (2022-12-06T20:37:23Z master c43951e60e) +MJIT [x86_64-linux]

---------  -----------  ----------  ----------  ----------  ------------  -------------
bench      before (ms)  stddev (%)  after (ms)  stddev (%)  before/after  after 1st itr
nbody      38.3         19.4        52.9        2.5         0.72          0.97
optcarrot  1399.2       0.7         1494.7      0.7         0.94          1.00
---------  -----------  ----------  ----------  ----------  ------------  -------------

Do you have any idea to make it as fast as before? This made MJIT slower than YJIT on Optcarrot. (mission accomplished?)

k0kubun · 2022-12-10T06:25:16Z

Out of curiosity, I benchmarked that commit with the interpreter as well. The difference is small, but I can reliably reproduce the interpreter slowdown on nbody:

before: ruby 3.2.0dev (2022-12-06T14:27:15Z master 9d4483f24d) [x86_64-linux]
after: ruby 3.2.0dev (2022-12-06T20:37:23Z master c43951e60e) [x86_64-linux]

-----  -----------  ----------  ----------  ----------  ------------  -------------
bench  before (ms)  stddev (%)  after (ms)  stddev (%)  before/after  after 1st itr
nbody  67.4         0.4         68.7        0.4         0.98          0.99
-----  -----------  ----------  ----------  ----------  ------------  -------------

Logs

$ ./run_benchmarks.rb nbody -e "before::/opt/rubies/before/bin/ruby" -e "after::/opt/rubies/after/bin/ruby"
Running benchmark "nbody" (1/1)
setarch x86_64 -R taskset -c 7 /opt/rubies/before/bin/ruby -I ./harness benchmarks/nbody/benchmark.rb
ruby 3.2.0dev (2022-12-06T14:27:15Z master 9d4483f24d) [x86_64-linux]
itr #1: 67ms
itr #2: 67ms
itr #3: 67ms
itr #4: 67ms
itr #5: 67ms
itr #6: 67ms
itr #7: 67ms
itr #8: 67ms
itr #9: 67ms
itr #10: 67ms
itr #11: 67ms
itr #12: 67ms
itr #13: 67ms
itr #14: 67ms
itr #15: 67ms
itr #16: 67ms
itr #17: 67ms
itr #18: 67ms
itr #19: 67ms
itr #20: 67ms
itr #21: 67ms
itr #22: 67ms
itr #23: 67ms
itr #24: 67ms
itr #25: 67ms
itr #26: 67ms
itr #27: 67ms
itr #28: 67ms
itr #29: 67ms
itr #30: 67ms
itr #31: 67ms
itr #32: 67ms
itr #33: 67ms
itr #34: 67ms
itr #35: 67ms
itr #36: 67ms
itr #37: 67ms
itr #38: 67ms
itr #39: 67ms
itr #40: 67ms
itr #41: 67ms
itr #42: 67ms
itr #43: 67ms
itr #44: 67ms
itr #45: 67ms
itr #46: 67ms
itr #47: 67ms
itr #48: 67ms
itr #49: 67ms
itr #50: 67ms
itr #51: 67ms
itr #52: 67ms
itr #53: 67ms
itr #54: 67ms
itr #55: 67ms
itr #56: 67ms
itr #57: 67ms
itr #58: 67ms
itr #59: 68ms
itr #60: 68ms
itr #61: 68ms
itr #62: 67ms
itr #63: 67ms
itr #64: 67ms
itr #65: 67ms
itr #66: 67ms
itr #67: 67ms
itr #68: 67ms
itr #69: 67ms
itr #70: 67ms
itr #71: 67ms
itr #72: 67ms
itr #73: 67ms
itr #74: 67ms
itr #75: 67ms
itr #76: 67ms
itr #77: 67ms
itr #78: 67ms
itr #79: 67ms
itr #80: 67ms
itr #81: 67ms
itr #82: 67ms
itr #83: 67ms
itr #84: 67ms
itr #85: 67ms
itr #86: 67ms
itr #87: 67ms
itr #88: 67ms
itr #89: 67ms
itr #90: 67ms
itr #91: 67ms
itr #92: 67ms
itr #93: 67ms
itr #94: 67ms
itr #95: 67ms
itr #96: 67ms
itr #97: 67ms
itr #98: 67ms
itr #99: 67ms
itr #100: 67ms
itr #101: 67ms
itr #102: 67ms
itr #103: 67ms
itr #104: 67ms
itr #105: 67ms
itr #106: 67ms
itr #107: 67ms
itr #108: 67ms
itr #109: 67ms
itr #110: 67ms
itr #111: 67ms
itr #112: 67ms
itr #113: 67ms
itr #114: 67ms
itr #115: 67ms
itr #116: 67ms
itr #117: 67ms
itr #118: 68ms
itr #119: 68ms
itr #120: 68ms
itr #121: 67ms
itr #122: 67ms
itr #123: 67ms
itr #124: 67ms
itr #125: 67ms
itr #126: 67ms
itr #127: 67ms
itr #128: 67ms
itr #129: 67ms
itr #130: 67ms
itr #131: 67ms
itr #132: 67ms
itr #133: 67ms
itr #134: 67ms
itr #135: 67ms
itr #136: 67ms
itr #137: 67ms
itr #138: 67ms
itr #139: 67ms
itr #140: 67ms
itr #141: 67ms
itr #142: 67ms
itr #143: 67ms
itr #144: 67ms
itr #145: 67ms
itr #146: 67ms
itr #147: 67ms
itr #148: 67ms
itr #149: 67ms
Average of last 134, non-warmup iters: 67ms
Running benchmark "nbody" (1/1)
setarch x86_64 -R taskset -c 7 /opt/rubies/after/bin/ruby -I ./harness benchmarks/nbody/benchmark.rb
ruby 3.2.0dev (2022-12-06T20:37:23Z master c43951e60e) [x86_64-linux]
itr #1: 68ms
itr #2: 68ms
itr #3: 68ms
itr #4: 68ms
itr #5: 68ms
itr #6: 68ms
itr #7: 68ms
itr #8: 68ms
itr #9: 68ms
itr #10: 68ms
itr #11: 68ms
itr #12: 68ms
itr #13: 68ms
itr #14: 68ms
itr #15: 68ms
itr #16: 69ms
itr #17: 68ms
itr #18: 68ms
itr #19: 68ms
itr #20: 68ms
itr #21: 68ms
itr #22: 68ms
itr #23: 68ms
itr #24: 68ms
itr #25: 68ms
itr #26: 68ms
itr #27: 69ms
itr #28: 69ms
itr #29: 68ms
itr #30: 68ms
itr #31: 68ms
itr #32: 68ms
itr #33: 68ms
itr #34: 68ms
itr #35: 68ms
itr #36: 68ms
itr #37: 68ms
itr #38: 68ms
itr #39: 68ms
itr #40: 68ms
itr #41: 68ms
itr #42: 68ms
itr #43: 68ms
itr #44: 68ms
itr #45: 68ms
itr #46: 68ms
itr #47: 68ms
itr #48: 68ms
itr #49: 68ms
itr #50: 68ms
itr #51: 68ms
itr #52: 68ms
itr #53: 68ms
itr #54: 68ms
itr #55: 68ms
itr #56: 68ms
itr #57: 68ms
itr #58: 68ms
itr #59: 68ms
itr #60: 68ms
itr #61: 68ms
itr #62: 68ms
itr #63: 68ms
itr #64: 68ms
itr #65: 68ms
itr #66: 68ms
itr #67: 68ms
itr #68: 68ms
itr #69: 68ms
itr #70: 68ms
itr #71: 68ms
itr #72: 68ms
itr #73: 68ms
itr #74: 68ms
itr #75: 68ms
itr #76: 68ms
itr #77: 68ms
itr #78: 68ms
itr #79: 68ms
itr #80: 68ms
itr #81: 68ms
itr #82: 68ms
itr #83: 68ms
itr #84: 68ms
itr #85: 69ms
itr #86: 69ms
itr #87: 69ms
itr #88: 68ms
itr #89: 68ms
itr #90: 68ms
itr #91: 68ms
itr #92: 68ms
itr #93: 68ms
itr #94: 68ms
itr #95: 68ms
itr #96: 68ms
itr #97: 68ms
itr #98: 68ms
itr #99: 68ms
itr #100: 68ms
itr #101: 68ms
itr #102: 68ms
itr #103: 68ms
itr #104: 68ms
itr #105: 68ms
itr #106: 68ms
itr #107: 68ms
itr #108: 68ms
itr #109: 68ms
itr #110: 68ms
itr #111: 68ms
itr #112: 68ms
itr #113: 68ms
itr #114: 68ms
itr #115: 68ms
itr #116: 68ms
itr #117: 68ms
itr #118: 68ms
itr #119: 68ms
itr #120: 68ms
itr #121: 68ms
itr #122: 68ms
itr #123: 68ms
itr #124: 68ms
itr #125: 68ms
itr #126: 68ms
itr #127: 68ms
itr #128: 68ms
itr #129: 68ms
itr #130: 68ms
itr #131: 68ms
itr #132: 68ms
itr #133: 68ms
itr #134: 68ms
itr #135: 68ms
itr #136: 68ms
itr #137: 68ms
itr #138: 68ms
itr #139: 68ms
itr #140: 68ms
itr #141: 69ms
itr #142: 68ms
itr #143: 69ms
itr #144: 69ms
itr #145: 69ms
itr #146: 68ms
Average of last 131, non-warmup iters: 68ms
Total time spent benchmarking: 20s

before: ruby 3.2.0dev (2022-12-06T14:27:15Z master 9d4483f24d) [x86_64-linux]
after: ruby 3.2.0dev (2022-12-06T20:37:23Z master c43951e60e) [x86_64-linux]

-----  -----------  ----------  ----------  ----------  ------------  -------------
bench  before (ms)  stddev (%)  after (ms)  stddev (%)  before/after  after 1st itr
nbody  67.4         0.4         68.7        0.4         0.98          0.99
-----  -----------  ----------  ----------  ----------  ------------  -------------
Legend:
- before/after: ratio of before/after time. Higher is better for after. Above 1 represents a speedup.
- after 1st itr: ratio of before/after time for the first benchmarking iteration.

In the Logs, you can see it used to be consistently 67ms (with a few 68ms itrs) whereas the first commit of this PR makes it 68ms (with a few 69ms itrs). I can kind of see such a slightly worse trend on the blue line of nbody on rubybench from that revision too.

jhawthorn · 2022-12-10T19:02:43Z

@k0kubun thanks for spotting. Unfortunately I don't seem to be able to reproduce. On my machine (a couple years old zen2) I'm seeing nbody slightly faster on the interpreter with the change (seems consistent, though there is definitely a difference in how much between executions), and no difference to MJIT.

./run_benchmarks.rb nbody -e "before::/home/jhawthorn/.rubies/ruby-trunk/bin/ruby" -e "after::/home/jhawthorn/src/ruby/ruby"
...
Average of last 91, non-warmup iters: 94ms
Total time spent benchmarking: 20s

before: ruby 3.2.0dev (2022-12-06T14:27:15Z cmp-bop 9d4483f24d) [x86_64-linux]
after: ruby 3.2.0dev (2022-12-06T20:37:23Z cmp-bop c43951e60e) [x86_64-linux]

-----  -----------  ----------  ----------  ----------  ------------  -------------
bench  before (ms)  stddev (%)  after (ms)  stddev (%)  before/after  after 1st itr
nbody  99.4         0.8         94.7        0.9         1.05          1.05
-----  -----------  ----------  ----------  ----------  ------------  -------------
Legend:
- before/after: ratio of before/after time. Higher is better for after. Above 1 represents a speedup.
- after 1st itr: ratio of before/after time for the first benchmarking iteration.

What machine are you on? Your benchmark does look like it's running run a fair bit faster than mine 😓. What arguments are you giving to yjit-bench for MJIT?

k0kubun · 2022-12-10T22:23:10Z

Just --mjit. My local machine uses Ryzen 7 5800X, and the benckmark server is https://rubybench.github.io/hardware.html. Both on Ubuntu 22.04, GCC. I could give you ssh access to the benchmark server if you want to try it.

matzbot requested a review from a team December 2, 2022 01:33

composerinteralia force-pushed the cmp-bop branch from 4eef079 to 40f0be0 Compare December 2, 2022 02:12

composerinteralia and others added 2 commits December 2, 2022 14:37

composerinteralia force-pushed the cmp-bop branch from 40f0be0 to ca330c5 Compare December 2, 2022 19:37

Update dependencies

9ce084c

composerinteralia force-pushed the cmp-bop branch from ca330c5 to 9ce084c Compare December 2, 2022 19:45

jhawthorn merged commit 64cdf8b into ruby:master Dec 6, 2022

jhawthorn mentioned this pull request Dec 12, 2022

Fix parens on LIKELY in BASIC_OP_UNREDEFINED_P #6911

Merged

k0kubun mentioned this pull request Dec 22, 2022

Skip reviews for cruby_bindings-only PRs [ci skip] #7004

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce BOP_CMP for optimized comparisons #6851

Introduce BOP_CMP for optimized comparisons #6851

Uh oh!

composerinteralia commented Dec 2, 2022 •

edited

Loading

Uh oh!

k0kubun commented Dec 2, 2022 •

edited

Loading

Uh oh!

composerinteralia commented Dec 2, 2022 •

edited

Loading

Uh oh!

k0kubun commented Dec 2, 2022

Uh oh!

composerinteralia commented Dec 2, 2022

Uh oh!

k0kubun commented Dec 2, 2022 •

edited

Loading

Uh oh!

k0kubun commented Dec 10, 2022 •

edited

Loading

Uh oh!

k0kubun commented Dec 10, 2022 •

edited

Loading

Uh oh!

jhawthorn commented Dec 10, 2022 •

edited

Loading

Uh oh!

k0kubun commented Dec 10, 2022

Uh oh!

Uh oh!

Introduce BOP_CMP for optimized comparisons #6851

Introduce BOP_CMP for optimized comparisons #6851

Uh oh!

Conversation

composerinteralia commented Dec 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Move BOP macros to separate file

Introduce BOP_CMP for optimized comparison

Uh oh!

k0kubun commented Dec 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

composerinteralia commented Dec 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k0kubun commented Dec 2, 2022

Uh oh!

composerinteralia commented Dec 2, 2022

Uh oh!

k0kubun commented Dec 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k0kubun commented Dec 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k0kubun commented Dec 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhawthorn commented Dec 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k0kubun commented Dec 10, 2022

Uh oh!

Uh oh!

composerinteralia commented Dec 2, 2022 •

edited

Loading

k0kubun commented Dec 2, 2022 •

edited

Loading

composerinteralia commented Dec 2, 2022 •

edited

Loading

k0kubun commented Dec 2, 2022 •

edited

Loading

k0kubun commented Dec 10, 2022 •

edited

Loading

k0kubun commented Dec 10, 2022 •

edited

Loading

jhawthorn commented Dec 10, 2022 •

edited

Loading