Skip to content

Let the GC update call data information #2698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

tenderlove
Copy link
Member

These patches let the GC update call data information, including call data used on static inline caches.

I've added a compact_count to the inline cache so that we can verify if all caches were updated. When the cache is read, we compare the compact count stored on the cache to the compact count in the GC. If those counts match, then we know the compactor was able to update references in the cache.

This patch isn't 100% ready yet, we need to make the compact_count commented out normally so that there is more room for inline cache entries. This is just to see if CI passes.

@ko1
Copy link
Contributor

ko1 commented Nov 27, 2019

If those counts match, then we know the compactor was able to update references in the cache.

I can't understand the idea. Could you write some pseudo algorithm code?

@tenderlove
Copy link
Member Author

tenderlove commented Nov 27, 2019

@ko1 I've added debug code to ensure the compactor updates all inline caches, including static caches. Algorithm is like this:

$compact_count = 0

def update_references
  heap.each_object do |object|
    case object
    when RubyVM::InstructionSequences
      instruction_sequences.each do |iseq|
        iseq.operands.each do |operand|
          case operand
          when InlineCache
            inline_cache = operand
            inline_cache.compact_count = $compact_count
            inline_cache.me = rb_gc_location(inline_cache.me)
          end
        end
      end
    else
      # other updates
    end
end

def gc_compact
  $compact_count += 1
  move_objects
  update_references
end

def vm_sendish cfp, inline_cache
  if inline_cache.compact_count != $compact_count
    raise "GC did not update references for this inline cache"
  end
end

def execute_iseq
  iseq.instructions.each do |instruction, operands|
    case instruction
    when :opt_send_without_block
      inline_cache = operands
      vm_sendish cfp, inline_cache
    else
      # other instructions
    end
  end
end

def mutator
  loop do
    execute_iseq
    gc_compact
  end
end

I need to add #ifdef for the debug code.

@ko1
Copy link
Contributor

ko1 commented Nov 27, 2019

Thank you.

def vm_sendish cfp, inline_cache
  if inline_cache.compact_count != $compact_count
    raise "GC did not update references for this inline cache"
  end
end

my concern is how normal method dispatchs are affected. the above pseudo-code raises an exception. is it intentional? or only assertion? I can't accept such exception for normal method dispatch.

@tenderlove
Copy link
Member Author

my concern is how normal method dispatchs are affected. the above pseudo-code raises an exception. is it intentional? or only assertion? I can't accept such exception for normal method dispatch.

This is only an assertion. I want to make a special CI job to test that all inline caches are updated correctly. Final patch will be more like:

def vm_sendish cfp, inline_cache
#ifdef GC_VERIFY_CALL_CACHE
  if inline_cache.compact_count != $compact_count
    raise "GC did not update references for this inline cache"
  end
#endif
end

@tenderlove tenderlove force-pushed the update-call-data branch 3 times, most recently from 5a91cea to 3b2c776 Compare December 20, 2019 23:31
@tenderlove tenderlove marked this pull request as ready for review January 9, 2020 01:07
@ko1
Copy link
Contributor

ko1 commented Jan 9, 2020

I'm rewriting inline method call algorithm completely and it will support relocation.

@tenderlove
Copy link
Member Author

tenderlove commented Jan 9, 2020

@ko1 should I close this PR? I want to finish automatic compaction, and I need relocation support. 😄

tenderlove and others added 7 commits January 13, 2020 12:54
Currently, compaction will invalidate all inline caches.  I would like
to update the compactor to fix references in inline caches.
Unfortunately this static variable is not reachable from the GC.  This
commit introduces a "call data" imemo type to wrap the call data.  This
way the GC can see the static call data variable and update its
references.
Make sure the compact count on call cache is initialized correctly so
that we can confirm inline cache values are updated

Co-Authored-By: John Hawthorn <john@hawthorn.email>
Since the caches are essentially weak references, the method entry could
get collected.  In that case we'll get a T_NONE back or some unrelated
object back from the GC.  In that case we just want to ignore it.
This benchmark is to ensure performance of `rb_funcallv`.
Now that GC can update inline caches, we can stop invalidating them on
compaction.
@tenderlove
Copy link
Member Author

tenderlove commented Jan 13, 2020

Call cache performance:

I expect this patch to have no impact to most call caches and very small impact to some call caches. There are two types of call caches, one defined in Ruby and the other defined as static variables.

Ruby Call Caches

These caches are allocated during script compile time and are store in the instruction sequences. This patch does not change the runtime behavior of call caches defined in Ruby. For example:

class Foo
  def bar(x)
    x.foo  # This runtime behavior doesn't change
  end
  def foo; end
end

x = Foo.new
x.bar x

This patch changes the compaction behavior. It updates references so that we get cache hits even though compaction occurred.

Static Call Caches

This patch does change the behavior of static call caches. Static call caches are typically defined in the rb_funcallv method here (though they can be in other places).

GC cannot see static variables, so this patch introduces a wrapper object. The GC can see the wrapper object, so it can update references stored in the cache.

Code Impact

Only rb_funcallv and rb_method_basic_definition_p runtime performance is impacted. The first call will fill in the static VALUE and subsequent calls will check to see if the value is filled in. If we check the assembly, the new code will nearly always do a cmpq:

instrumentscli2 2020-01-13 13-19-04

I think that branch prediction will make this negligible compared to the original.

Performance Tests

Array#grep will call rb_funcallv, so we can use that method to test cache performance impact. I've added a benchmark here.

Benchmark results:

$ make benchmark ARGS='benchmark/array_grep.yml'
/Users/aaron/.rbenv/shims/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \
	            --executables="compare-ruby::/Users/aaron/.rbenv/shims/ruby --disable=gems -I.ext/common --disable-gem" \
	            --executables="built-ruby::./miniruby -I./lib -I. -I.ext/common  ./tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
	            benchmark/array_grep.yml 
Calculating -------------------------------------
                     compare-ruby  built-ruby 
                grep     839.454k    821.895k i/s -      2.000M times in 2.382501s 2.433400s

Comparison:
                             grep
        compare-ruby:    839454.0 i/s 
          built-ruby:    821895.3 i/s - 1.02x  slower

[aaron@tc-lan-adapter ~/g/ruby (update-call-data)]$ /Users/aaron/.rbenv/shims/ruby -v
ruby 2.8.0dev (2020-01-13T12:22:22Z master 5aa0e6bee9) [x86_64-darwin19]
[aaron@tc-lan-adapter ~/g/ruby (update-call-data)]$ ./ruby -v
ruby 2.8.0dev (2020-01-13T21:09:52Z update-call-data 13bd332d8d) [x86_64-darwin19]

The version of rb_funcallv on this branch may be slightly slower, but it's hard to say.

Inline Cache Improvements

To measure inline cache improvements, I recompiled Ruby to output cache hit / miss debug counters.

Here is the test program:

class CachedCalls
  eval ('aa'..'zz').map { |x| "def m#{x}(x); x.foo; end" }.join("\n")
  eval "def call(x);" + ('aa'..'zz').map { |x| "m#{x}(x);" }.join("\n") + "; end"
end

class Foo; def foo; end end

cc = CachedCalls.new
foo = Foo.new
100.times do
  cc.call foo
  GC.compact
end

This program will demonstrate inline cache hit and misses when GC compaction is executed.

Here are the counters from master:

$ ./ruby -v thing.rb ^| grep mc_inline
ruby 2.8.0dev (2020-01-13T12:22:22Z master 5aa0e6bee9) [x86_64-darwin19]
[RUBY_DEBUG_COUNTER]	mc_inline_hit                 	        17,533
[RUBY_DEBUG_COUNTER]	mc_inline_miss                	       137,812

Here are the counters from this branch:

$ ./ruby -v thing.rb ^| grep mc_inline
ruby 2.8.0dev (2020-01-13T21:09:52Z update-call-data 13bd332d8d) [x86_64-darwin19]
[RUBY_DEBUG_COUNTER]	mc_inline_hit                 	       151,579
[RUBY_DEBUG_COUNTER]	mc_inline_miss                	         3,766

This patch ensures inline caches are updated so that we get hits instead of misses.

Static Inline Cache Improvements

We can demonstrate that static inline caches are updated during GC compaction as well:

class CachedCalls
  def funcallv
    list = ('a'..'z').to_a

    1_000.times do
      list.grep('z') { 'z' }
      GC.compact
    end
  end
end

class Foo; def foo; end end

cc = CachedCalls.new
cc.funcallv

This benchmark calls grep which uses rb_funcallv. If we compare cache counters we see this:

master branch:

$ ./ruby -v thing.rb ^| grep mc_inline
ruby 2.8.0dev (2020-01-13T12:22:22Z master 5aa0e6bee9) [x86_64-darwin19]
[RUBY_DEBUG_COUNTER]	mc_inline_hit                 	        40,507
[RUBY_DEBUG_COUNTER]	mc_inline_miss                	         5,404

This branch:

$ ./ruby -v thing.rb ^| grep mc_inline
ruby 2.8.0dev (2020-01-13T21:09:52Z update-call-data 13bd332d8d) [x86_64-darwin19]
[RUBY_DEBUG_COUNTER]	mc_inline_hit                 	        43,504
[RUBY_DEBUG_COUNTER]	mc_inline_miss                	         2,407

We can see the inline cache hits increased with this branch.

@ko1
Copy link
Contributor

ko1 commented Jan 14, 2020

Thank you for measurements.
However, it only evaluates worst cases. Of course, it is important. However, we need to measure more about normal cases.

For example, the following script generates a script running N classes and invoke methods for them. For smaller N, all of calldata are in CPU cache. For bigger N, it emulates bigger apps.

N = ARGV.shift.to_i
L = 100_000_000 / N

puts "L=#{L}"
puts N.times.map{|i| "class C#{i}; def foo; end; end"}.join("\n") + "\n" +
     N.times.map{|i| "o#{i} = C#{i}.new"             }.join("\n") + "\n"
puts 'L.times{'
  puts N.times.to_a.shuffle.map{|i| "o#{i}.foo"}.join("\n")
puts '}'

Anyway, as I wrote that I'm rewriting inline method cache algorithm and I can't accept this approach. My patch is similar, but introduce CC sharing with new CC invalidation protocol.

@tenderlove
Copy link
Member Author

@ko1 no problem! Thanks for letting me know what you were looking for. Next time I need to benchmark CCs I'll use this info!

@tenderlove tenderlove closed this Jan 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants