Let the GC update call data information #2698

tenderlove · 2019-11-26T21:52:07Z

These patches let the GC update call data information, including call data used on static inline caches.

I've added a compact_count to the inline cache so that we can verify if all caches were updated. When the cache is read, we compare the compact count stored on the cache to the compact count in the GC. If those counts match, then we know the compactor was able to update references in the cache.

This patch isn't 100% ready yet, we need to make the compact_count commented out normally so that there is more room for inline cache entries. This is just to see if CI passes.

vm_insnhelper.c

ko1 · 2019-11-27T01:59:13Z

If those counts match, then we know the compactor was able to update references in the cache.

I can't understand the idea. Could you write some pseudo algorithm code?

tenderlove · 2019-11-27T05:37:58Z

@ko1 I've added debug code to ensure the compactor updates all inline caches, including static caches. Algorithm is like this:

$compact_count = 0

def update_references
  heap.each_object do |object|
    case object
    when RubyVM::InstructionSequences
      instruction_sequences.each do |iseq|
        iseq.operands.each do |operand|
          case operand
          when InlineCache
            inline_cache = operand
            inline_cache.compact_count = $compact_count
            inline_cache.me = rb_gc_location(inline_cache.me)
          end
        end
      end
    else
      # other updates
    end
end

def gc_compact
  $compact_count += 1
  move_objects
  update_references
end

def vm_sendish cfp, inline_cache
  if inline_cache.compact_count != $compact_count
    raise "GC did not update references for this inline cache"
  end
end

def execute_iseq
  iseq.instructions.each do |instruction, operands|
    case instruction
    when :opt_send_without_block
      inline_cache = operands
      vm_sendish cfp, inline_cache
    else
      # other instructions
    end
  end
end

def mutator
  loop do
    execute_iseq
    gc_compact
  end
end

I need to add #ifdef for the debug code.

ko1 · 2019-11-27T06:58:12Z

Thank you.

def vm_sendish cfp, inline_cache
  if inline_cache.compact_count != $compact_count
    raise "GC did not update references for this inline cache"
  end
end

my concern is how normal method dispatchs are affected. the above pseudo-code raises an exception. is it intentional? or only assertion? I can't accept such exception for normal method dispatch.

tenderlove · 2019-11-27T16:01:55Z

my concern is how normal method dispatchs are affected. the above pseudo-code raises an exception. is it intentional? or only assertion? I can't accept such exception for normal method dispatch.

This is only an assertion. I want to make a special CI job to test that all inline caches are updated correctly. Final patch will be more like:

def vm_sendish cfp, inline_cache
#ifdef GC_VERIFY_CALL_CACHE
  if inline_cache.compact_count != $compact_count
    raise "GC did not update references for this inline cache"
  end
#endif
end

ko1 · 2020-01-09T09:39:44Z

I'm rewriting inline method call algorithm completely and it will support relocation.

tenderlove · 2020-01-09T21:58:52Z

@ko1 should I close this PR? I want to finish automatic compaction, and I need relocation support. 😄

Currently, compaction will invalidate all inline caches. I would like to update the compactor to fix references in inline caches. Unfortunately this static variable is not reachable from the GC. This commit introduces a "call data" imemo type to wrap the call data. This way the GC can see the static call data variable and update its references.

Make sure the compact count on call cache is initialized correctly so that we can confirm inline cache values are updated Co-Authored-By: John Hawthorn <john@hawthorn.email>

Since the caches are essentially weak references, the method entry could get collected. In that case we'll get a T_NONE back or some unrelated object back from the GC. In that case we just want to ignore it.

This benchmark is to ensure performance of `rb_funcallv`.

Now that GC can update inline caches, we can stop invalidating them on compaction.

tenderlove · 2020-01-13T21:39:10Z

Call cache performance:

I expect this patch to have no impact to most call caches and very small impact to some call caches. There are two types of call caches, one defined in Ruby and the other defined as static variables.

Ruby Call Caches

These caches are allocated during script compile time and are store in the instruction sequences. This patch does not change the runtime behavior of call caches defined in Ruby. For example:

class Foo
  def bar(x)
    x.foo  # This runtime behavior doesn't change
  end
  def foo; end
end

x = Foo.new
x.bar x

This patch changes the compaction behavior. It updates references so that we get cache hits even though compaction occurred.

Static Call Caches

This patch does change the behavior of static call caches. Static call caches are typically defined in the rb_funcallv method here (though they can be in other places).

GC cannot see static variables, so this patch introduces a wrapper object. The GC can see the wrapper object, so it can update references stored in the cache.

Code Impact

Only rb_funcallv and rb_method_basic_definition_p runtime performance is impacted. The first call will fill in the static VALUE and subsequent calls will check to see if the value is filled in. If we check the assembly, the new code will nearly always do a cmpq:

I think that branch prediction will make this negligible compared to the original.

Performance Tests

Array#grep will call rb_funcallv, so we can use that method to test cache performance impact. I've added a benchmark here.

Benchmark results:

$ make benchmark ARGS='benchmark/array_grep.yml'
/Users/aaron/.rbenv/shims/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \
	            --executables="compare-ruby::/Users/aaron/.rbenv/shims/ruby --disable=gems -I.ext/common --disable-gem" \
	            --executables="built-ruby::./miniruby -I./lib -I. -I.ext/common  ./tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
	            benchmark/array_grep.yml 
Calculating -------------------------------------
                     compare-ruby  built-ruby 
                grep     839.454k    821.895k i/s -      2.000M times in 2.382501s 2.433400s

Comparison:
                             grep
        compare-ruby:    839454.0 i/s 
          built-ruby:    821895.3 i/s - 1.02x  slower

[aaron@tc-lan-adapter ~/g/ruby (update-call-data)]$ /Users/aaron/.rbenv/shims/ruby -v
ruby 2.8.0dev (2020-01-13T12:22:22Z master 5aa0e6bee9) [x86_64-darwin19]
[aaron@tc-lan-adapter ~/g/ruby (update-call-data)]$ ./ruby -v
ruby 2.8.0dev (2020-01-13T21:09:52Z update-call-data 13bd332d8d) [x86_64-darwin19]

The version of rb_funcallv on this branch may be slightly slower, but it's hard to say.

Inline Cache Improvements

To measure inline cache improvements, I recompiled Ruby to output cache hit / miss debug counters.

Here is the test program:

class CachedCalls
  eval ('aa'..'zz').map { |x| "def m#{x}(x); x.foo; end" }.join("\n")
  eval "def call(x);" + ('aa'..'zz').map { |x| "m#{x}(x);" }.join("\n") + "; end"
end

class Foo; def foo; end end

cc = CachedCalls.new
foo = Foo.new
100.times do
  cc.call foo
  GC.compact
end

This program will demonstrate inline cache hit and misses when GC compaction is executed.

Here are the counters from master:

$ ./ruby -v thing.rb ^| grep mc_inline
ruby 2.8.0dev (2020-01-13T12:22:22Z master 5aa0e6bee9) [x86_64-darwin19]
[RUBY_DEBUG_COUNTER]	mc_inline_hit                 	        17,533
[RUBY_DEBUG_COUNTER]	mc_inline_miss                	       137,812

Here are the counters from this branch:

$ ./ruby -v thing.rb ^| grep mc_inline
ruby 2.8.0dev (2020-01-13T21:09:52Z update-call-data 13bd332d8d) [x86_64-darwin19]
[RUBY_DEBUG_COUNTER]	mc_inline_hit                 	       151,579
[RUBY_DEBUG_COUNTER]	mc_inline_miss                	         3,766

This patch ensures inline caches are updated so that we get hits instead of misses.

Static Inline Cache Improvements

We can demonstrate that static inline caches are updated during GC compaction as well:

class CachedCalls
  def funcallv
    list = ('a'..'z').to_a

    1_000.times do
      list.grep('z') { 'z' }
      GC.compact
    end
  end
end

class Foo; def foo; end end

cc = CachedCalls.new
cc.funcallv

This benchmark calls grep which uses rb_funcallv. If we compare cache counters we see this:

master branch:

$ ./ruby -v thing.rb ^| grep mc_inline
ruby 2.8.0dev (2020-01-13T12:22:22Z master 5aa0e6bee9) [x86_64-darwin19]
[RUBY_DEBUG_COUNTER]	mc_inline_hit                 	        40,507
[RUBY_DEBUG_COUNTER]	mc_inline_miss                	         5,404

This branch:

$ ./ruby -v thing.rb ^| grep mc_inline
ruby 2.8.0dev (2020-01-13T21:09:52Z update-call-data 13bd332d8d) [x86_64-darwin19]
[RUBY_DEBUG_COUNTER]	mc_inline_hit                 	        43,504
[RUBY_DEBUG_COUNTER]	mc_inline_miss                	         2,407

We can see the inline cache hits increased with this branch.

ko1 · 2020-01-14T18:16:33Z

Thank you for measurements.
However, it only evaluates worst cases. Of course, it is important. However, we need to measure more about normal cases.

For example, the following script generates a script running N classes and invoke methods for them. For smaller N, all of calldata are in CPU cache. For bigger N, it emulates bigger apps.

N = ARGV.shift.to_i
L = 100_000_000 / N

puts "L=#{L}"
puts N.times.map{|i| "class C#{i}; def foo; end; end"}.join("\n") + "\n" +
     N.times.map{|i| "o#{i} = C#{i}.new"             }.join("\n") + "\n"
puts 'L.times{'
  puts N.times.to_a.shuffle.map{|i| "o#{i}.foo"}.join("\n")
puts '}'

Anyway, as I wrote that I'm rewriting inline method cache algorithm and I can't accept this approach. My patch is similar, but introduce CC sharing with new CC invalidation protocol.

tenderlove · 2020-01-14T22:37:17Z

@ko1 no problem! Thanks for letting me know what you were looking for. Next time I need to benchmark CCs I'll use this info!

tenderlove force-pushed the update-call-data branch from 8abf8ff to 3839127 Compare November 27, 2019 00:56

tenderlove mentioned this pull request Nov 27, 2019

Wrap "call data" with an IMEMO object #2627

Closed

ko1 reviewed Nov 27, 2019

View reviewed changes

vm_insnhelper.c Show resolved Hide resolved

tenderlove force-pushed the update-call-data branch 3 times, most recently from 5a91cea to 3b2c776 Compare December 20, 2019 23:31

tenderlove force-pushed the update-call-data branch from 3b2c776 to c2799f4 Compare January 8, 2020 22:28

tenderlove marked this pull request as ready for review January 9, 2020 01:07

updating call caches in iseqs

8b56794

tenderlove force-pushed the update-call-data branch from 7e44f04 to a6fae06 Compare January 13, 2020 16:22

tenderlove and others added 7 commits January 13, 2020 12:54

Correctly initialize the compact count on call cache

58d9c52

Make sure the compact count on call cache is initialized correctly so that we can confirm inline cache values are updated Co-Authored-By: John Hawthorn <john@hawthorn.email>

Update references in the global method cache

79fbe7d

Ensure the type we get back from the GC is what we want

e2bae0c

Since the caches are essentially weak references, the method entry could get collected. In that case we'll get a T_NONE back or some unrelated object back from the GC. In that case we just want to ignore it.

Remove compact_count if we're not debugging

fd9f382

update common.mk dependencies

fe97a46

Add a benchmark that touches rb_funcallv

da3da25

This benchmark is to ensure performance of `rb_funcallv`.

tenderlove force-pushed the update-call-data branch from b6988ec to 7b19e8c Compare January 13, 2020 20:56

Stop clearing method cache

13bd332

Now that GC can update inline caches, we can stop invalidating them on compaction.

tenderlove force-pushed the update-call-data branch from 7b19e8c to 13bd332 Compare January 13, 2020 21:10

clean up this conditional a bit

21dc804

tenderlove closed this Jan 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Let the GC update call data information #2698

Let the GC update call data information #2698

Uh oh!

tenderlove commented Nov 26, 2019

Uh oh!

Uh oh!

ko1 commented Nov 27, 2019

Uh oh!

tenderlove commented Nov 27, 2019 •

edited

Loading

Uh oh!

ko1 commented Nov 27, 2019

Uh oh!

tenderlove commented Nov 27, 2019

Uh oh!

ko1 commented Jan 9, 2020

Uh oh!

tenderlove commented Jan 9, 2020 •

edited

Loading

Uh oh!

tenderlove commented Jan 13, 2020 •

edited

Loading

Uh oh!

ko1 commented Jan 14, 2020 •

edited

Loading

Uh oh!

tenderlove commented Jan 14, 2020

Uh oh!

Uh oh!

Let the GC update call data information #2698

Let the GC update call data information #2698

Uh oh!

Conversation

tenderlove commented Nov 26, 2019

Uh oh!

Uh oh!

ko1 commented Nov 27, 2019

Uh oh!

tenderlove commented Nov 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ko1 commented Nov 27, 2019

Uh oh!

tenderlove commented Nov 27, 2019

Uh oh!

ko1 commented Jan 9, 2020

Uh oh!

tenderlove commented Jan 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tenderlove commented Jan 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Call cache performance:

Ruby Call Caches

Static Call Caches

Code Impact

Performance Tests

Inline Cache Improvements

Static Inline Cache Improvements

Uh oh!

ko1 commented Jan 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tenderlove commented Jan 14, 2020

Uh oh!

Uh oh!

tenderlove commented Nov 27, 2019 •

edited

Loading

tenderlove commented Jan 9, 2020 •

edited

Loading

tenderlove commented Jan 13, 2020 •

edited

Loading

ko1 commented Jan 14, 2020 •

edited

Loading