Skip to content

Optimize callcache invalidation for refinements #13077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 9, 2025
Merged

Conversation

alpaca-tc
Copy link
Contributor

@alpaca-tc alpaca-tc commented Apr 6, 2025

Fixes [Bug #21201]

This change addresses a performance regression where defining methods inside refine blocks caused severe slowdowns. The issue was due to rb_clear_all_refinement_method_cache() triggering a full object space scan via rb_objspace_each_objects to find and invalidate affected callcaches, which is very inefficient.

To fix this, I introduce vm->cc_refinement_table to track callcaches related to refinements. This allows us to invalidate only the necessary callcaches without scanning the entire heap, resulting in significant performance improvement.

benchmark

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  gem "benchmark-ips"
end

mod = Module.new
klass = Class.new

Benchmark.ips do |x|
  x.report(RUBY_VERSION) do
    mod.send(:refine, klass) do
      def call_1 = nil
      def call_2 = nil
      def call_3 = nil
    end
  end

  x.save! "/tmp/performance_regression_refine.bench"
  x.compare!
end
ruby 3.5.0dev (2025-06-05T14:33:43Z bugs/21201 e7833e82f4) +PRISM [arm64-darwin24]
Warming up --------------------------------------
         3.5.0_patch    50.158k i/100ms
Calculating -------------------------------------
         3.5.0_patch    520.371k (± 7.2%) i/s    (1.92 μs/i) -      2.608M in   5.038852s

Comparison:
               3.2.8:  1597482.0 i/s
         3.5.0_patch:   520371.1 i/s - 3.07x  slower (this PR)
               3.3.8:      213.0 i/s - 7499.45x  slower
               3.4.4:      138.2 i/s - 11561.20x  slower
3.5.0_master_0e0008da0f19d:      137.8 i/s - 11593.43x  slower

This comment has been minimized.

@alpaca-tc alpaca-tc force-pushed the bugs/21201 branch 4 times, most recently from 122dce4 to c418bea Compare April 7, 2025 14:08
@alpaca-tc alpaca-tc marked this pull request as ready for review April 7, 2025 14:10
@byroot byroot requested a review from ko1 April 7, 2025 14:38
vm.c Outdated
@@ -4443,6 +4448,7 @@ Init_vm_objects(void)
vm->loading_table = st_init_strtable();
vm->ci_table = st_init_table(&vm_ci_hashtype);
vm->frozen_strings = st_init_table_with_size(&rb_fstring_hash_type, 10000);
vm->cc_refinement_table = st_init_table(&vm_cc_refinement_hashtype);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vm->cc_refinement_table = st_init_table(&vm_cc_refinement_hashtype);
vm->cc_refinement_table = st_init_numtable();

I think you can just use a standard numtable, and get rid of your hash and cmp functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see! I force-pushed the fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does numtable compute hash values from address?
If so, it might be necessary to rebuild and recalculate the hash values after GC compaction.
Alternatively, using a unique object_id as in the original implementation could result in stable hash values.

Which is better: rebuilding or the original approach?
If rebuilding, where should it happen? I’ll investigate tomorrow.

However, this is my first time writing C code, so I might be saying something incorrect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does numtable compute hash values from address?

Yes it does.

If so, it might be necessary to rebuild and recalculate the hash values after GC compaction.

Indeed, I thought it being declared as a weak table would be enough, but looking closer at the code, it update the value in place, ignoring the hash.

So we indeed either need an address independent hash, or to use a more complex way to update that table after compaction.

Alternatively, using a unique object_id as in the original implementation could result in stable hash values.

object_id insert that object in two other tables (obj_to_id and id_to_obj), so it's not ideal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or to use a more complex way to update that table after compaction.

One simple way could just be to rebuild it entirely after compaction. Iterate on the old one and insert in a new one, then swap them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code to perform rehashing (deletion and reinsertion) within the foreach loop, following the approach used in vm_weak_table_gen_ivar_foreach.

Also, value was added for compatibility with vm_weak_table_foreach_update_weak_key, but it's no longer needed after switching to foreach. So now I'm passing 1 instead of cc, as the value is unused.
Hope #13074 gets merged soon 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered the copy-then-swap approach, but if no objects move, it wastes memory by duplicating data unnecessarily.
I'm going with a naive implementation for now, though I'm not confident it's the right choice.
If GC compaction moves most cc, I'd prefer copy-then-swap.
I'd appreciate any insights on how GC compaction behaves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy-then-swap seems better after all. It's fast when there are few entries, and if there are many, ST_REPLACE will likely happen anyway.
@byroot Updated the code, would appreciate a review 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would appreciate a review 😄

I can't see anything wrong with that PR, but I'm not that knowledgeable about GC and Ractors, so that doesn't say much.

I'll leave it to @peterzhu2118 to say if there is an issue with the GC code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to everyone who's already reviewed this-no issues have been spotted so far. @peterzhu2118 , when you have a moment, could you take a look at the GC-related changes as well?

@ko1
Copy link
Contributor

ko1 commented Apr 7, 2025

I'm not sure about compaction, but is it safe for GC.compact?
cc/ @tenderlove

@byroot
Copy link
Member

byroot commented Apr 7, 2025

I'm not sure about compaction, but is it safe for GC.compact?

Right, it's missing some code in gc/default.c -> gc_update_references to update the call cache entries in that weak map.

@byroot
Copy link
Member

byroot commented Apr 7, 2025

Nevermind, It's already handled because it's declared as a weak table.

@tenderlove
Copy link
Member

I'm not sure about compaction, but is it safe for GC.compact?

The reference updating code looks correct to me

@alpaca-tc alpaca-tc force-pushed the bugs/21201 branch 3 times, most recently from 571398d to 6a20b7c Compare April 8, 2025 15:47
@alpaca-tc
Copy link
Contributor Author

alpaca-tc commented Apr 8, 2025

CI failed due to a network error, so I rebased and force-pushed to trigger it again.

@alpaca-tc alpaca-tc force-pushed the bugs/21201 branch 3 times, most recently from a2fe156 to a959b6f Compare April 10, 2025 06:08
@alpaca-tc alpaca-tc force-pushed the bugs/21201 branch 3 times, most recently from 2a2198f to 02b2e62 Compare April 10, 2025 13:35
kinoppyd added a commit to kufu/ruby that referenced this pull request Apr 24, 2025
exSOUL added a commit to kufu/ruby that referenced this pull request Apr 24, 2025
@alpaca-tc alpaca-tc marked this pull request as draft May 8, 2025 10:59
@alpaca-tc
Copy link
Contributor Author

alpaca-tc commented May 8, 2025

I’m fixing the CI errors after rebasing
done

@alpaca-tc alpaca-tc force-pushed the bugs/21201 branch 2 times, most recently from 1c6afb9 to 811d670 Compare May 9, 2025 04:15
@alpaca-tc alpaca-tc marked this pull request as ready for review May 9, 2025 11:43
@alpaca-tc alpaca-tc force-pushed the bugs/21201 branch 2 times, most recently from 348bcef to 15321e3 Compare May 28, 2025 13:07
Fixes [Bug #21201]

This change addresses a performance regression where defining methods
inside `refine` blocks caused severe slowdowns. The issue was due to
`rb_clear_all_refinement_method_cache()` triggering a full object
space scan via `rb_objspace_each_objects` to find and invalidate
affected callcaches, which is very inefficient.

To fix this, I introduce `vm->cc_refinement_table` to track
callcaches related to refinements. This allows us to invalidate
only the necessary callcaches without scanning the entire heap,
resulting in significant performance improvement.
@ko1 ko1 merged commit c8ddc0a into ruby:master Jun 9, 2025
82 checks passed
@alpaca-tc alpaca-tc deleted the bugs/21201 branch June 9, 2025 03:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants