Add ability to trace exit locations in yjit #5970

eileencodes · 2022-06-01T14:42:31Z

When running with --yjit-stats turned on Ruby code can inform the user
what the most common exits are. While this is useful information it
doesn't tell you the source location of the code that exited or what the
code that exited looks like. This change intends to fix that.

To use the feature, run yjit with --yjit-stats and --yjit-trace-exits,
which will record the backtrace for every exit that occurs. Users must save
the output of RubyVM::YJIT.exit_locations to a dump file. That file
can then be read by StackProf to see the code that exited and the
reason.

Example usage:

Given the following script, we write to a file called
concat_array.dump the results of RubyVM::YJIT.exit_locations.

def concat_array
  ["t", "r", *x = "u", "e"].join
end

1000.times do
  concat_array
end

File.write("concat_array.dump", Marshal.dump(RubyVM::YJIT.exit_locations))

When we run the file with this branch and the appropriate flags the
stacktrace will be recorded. Note Stackprof needs to be installed or you
need to point to the library directly.

./ruby --yjit --yjit-call-threshold=1 --yjit-stats --yjit-trace-exits -I/Users/eileencodes/open_source/stackprof/lib test.rb

We can then read the dump file with Stackprof:

./ruby -I/Users/eileencodes/open_source/stackprof/lib/ /Users/eileencodes/open_source/stackprof/bin/stackprof --text concat_array.dump

Results will look similar to the following:

==================================
  Mode: ()
  Samples: 1817 (0.00% miss rate)
  GC: 0 (0.00%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
      1001  (55.1%)        1001  (55.1%)     concatarray
       335  (18.4%)         335  (18.4%)     invokeblock
       178   (9.8%)         178   (9.8%)     send
       140   (7.7%)         140   (7.7%)     opt_getinlinecache
       ...etc...

Simply inspecting the concatarray method will give SOURCE UNAVAILABLE because the source is insns.def.

./ruby -I/Users/eileencodes/open_source/stackprof/lib/ /Users/eileencodes/open_source/stackprof/bin/stackprof --text concat_array.dump --method concatarray

Result:

concatarray (nonexistent.def:1)
  samples:  1001 self (55.1%)  /   1001 total (55.1%)
  callers:
    1000  (   99.9%)  Object#concat_array
       1  (    0.1%)  Gem.suffixes
  callees (0 total):
  code:
        SOURCE UNAVAILABLE

However if we go deeper to the callee we can see the exact
source of the concatarray exit.

./ruby -I/Users/eileencodes/open_source/stackprof/lib/ /Users/eileencodes/open_source/stackprof/bin/stackprof --text concat_array.dump --method Object#concat_array

Object#concat_array (/Users/eileencodes/open_source/rust_ruby/test.rb:1)
  samples:     0 self (0.0%)  /   1000 total (55.0%)
  callers:
    1000  (  100.0%)  block in <main>
  callees (1000 total):
    1000  (  100.0%)  concatarray
  code:
                                  |     1  | def concat_array
 1000   (55.0%)                   |     2  |   ["t", "r", *x = "u", "e"].join
                                  |     3  | end

The --walk option is recommended for this feature as it make it
easier to traverse the tree of exits.

Goals of this feature:

This feature is meant to give more information when working on YJIT.
The idea is that if we know what code is exiting we can decide what
areas to prioritize when fixing exits. In some cases this means adding
prioritizing avoiding certain exits in yjit. In more complex cases it
might mean changing the Ruby code to be more performant when run with
yjit. Ultimately the more information we have about what code is exiting
AND why, the better we can make yjit.

Known limitations:

Due to tracing exits, running this on large codebases like Rails
can be quite slow.
On complex methods it can still be difficult to pinpoint the exact cause of
an exit.
Stackprof is a requirement to to view the backtrace information from
the dump file

Co-authored-by: Aaron Patterson tenderlove@ruby-lang.org

Ruby Issue: https://bugs.ruby-lang.org/issues/18812

maximecb · 2022-06-01T18:11:16Z

Some comments based on the description:

To use the feature, run yjit with --yjit-stats and --yjit-trace-exits

Maybe --yjit-trace-exits should automatically enable --yjit-stats.

File.write("concat_array.dump", Marshal.dump(RubyVM::YJIT.exit_locations))

Could make sense to add a RubyVM::YJIT.dump_exit_locations(file_name) for convenience?

I wish you'd posted the output on railsbench or a larger example so we can see the full potential.

maximecb · 2022-06-01T18:13:32Z

test/ruby/test_yjit.rb

  def test_compile_putobject
-    assert_compiles('true', insns: %i[putobject], result: true)
-    assert_compiles('123', insns: %i[putobject], result: 123)
-    assert_compiles(':foo', insns: %i[putobject], result: :foo)
+    assert_compiles('true', insns: %i[putobject], result: true, trace_exits: true)
+    assert_compiles('123', insns: %i[putobject], result: 123, trace_exits: true)
+    assert_compiles(':foo', insns: %i[putobject], result: :foo, trace_exits: true)
  end


A bit confused as to why you're setting trace_exits: true for some tests but not others. What are you testing? 🤔

Some of the tests create so much data that tracing exits slowed them down, so instead I chose a bunch that I know produced exits and tested that the hash has the keys I expect (and aren't empty).

maximecb · 2022-06-01T18:15:57Z

yjit.c

+// Parses the yjit_raw_samples and yjit_line_samples data into three
+// hashes (raw, lines, and frames) to be used by RubyVM::YJIT.exit_locations.


What are raw samples and line samples?

maximecb · 2022-06-01T18:18:22Z

yjit/src/codegen.rs

+        // If --yjit-trace-exits option is enabled, record the exit stack
+        // while recording the side exits.
+        if get_option!(gen_trace_exits) {
+            mov(cb, RDI, const_ptr_opnd(exit_pc as *const u8));


use C_ARG_REGS[0] instead of directly using RDI

maximecb · 2022-06-01T18:19:24Z

yjit/src/codegen.rs

+    /// For implementing exit location tracking
+    yjit_raw_samples: Vec<VALUE>,
+    yjit_line_samples: Vec<i32>


The state for this should be stored in stats.rs, not in codegen.rs

I had a struct just for these values, but @noahgibbs recommended moving it into CodegenGlobals but I don't remember the exact reason why. Unless I totally misunderstood what he meant when we paired on this, @noahgibbs do you remember?

Previously the way I was accessing my struct (needing an instance of it) required me to wrap this in unsafe, but that might have been my implementation. I don't know enough Rust to be confident in that.

I think we talked about moving it into CodegenGlobals to avoid having the additional unsafe dance for a new structure. It sounds like @maximecb is recommending an additional unsafe dance for a new structure -- I don't think she wants it in the stats counters, so it would need its own singleton setup and accessors (including unsafe.)

If it was an additional structure in stats.rb, not in counters, you would indeed need unsafe for it. It's basically a singleton or global at that point, and Rust demands unsafe for situations with possible data races like that :-/

I wrapped the counter increments in a macro to avoid having to expose the internals of how its accessed.

It just doesn't semantically belong in CodegenGlobals 😅

maximecb · 2022-06-01T18:21:45Z

yjit/src/stats.rs

+    while idx < yjit_raw_samples.len() as size_t {
+        let num = yjit_raw_samples[idx as usize];
+        let mut i = 0;
+
+        idx += 1;
+
+        while i < i32::from(num) {
+            unsafe { rb_gc_mark(yjit_raw_samples[idx as usize]); }
+            i += 1;
+            idx += 1;
+        }
+
+        idx += 1;
+        idx += 1;
+    }


This needs commenting.

What are raw samples? What kind of values do they contain that needs to be GC'd?

Why does idx +=1 appear twice in a row?

@maximecb Each frame is wrapped by a GC allocated object that is returned via rb_profile_frames.

@eileencodes correct me if I'm wrong, but IIRC the layout of raw_samples looks something like this:

[ len, frame1, frame2, frameN, exit_insn, count, len, frame1, frame2, frameN, exit_insn, count, ... ] ^------------- First Sample ----------------^ ^--------------- Second Sample -------------^ ^--- etc ---^

The array isn't a homogenous list of VALUE pointers, it also contains the exit instruction (which isn't a VALUE) and also book keeping (the number of times we've seen that particular stack in a row and the actual length of the sample).

The two idx bumps skip those two "non GC allocated" objects.

Should explain that quickly in the comments, maybe mention which source file rb_profile_frames is defined in.

maximecb · 2022-06-01T18:23:10Z

yjit/src/stats.rs

+    const BUFF_LEN: usize = 2048;
+    let mut frames_buffer = [VALUE(0 as usize); BUFF_LEN];
+    let mut lines_buffer = [0; BUFF_LEN];
+    let num = unsafe { rb_profile_frames(0, BUFF_LEN as i32, frames_buffer.as_mut_ptr(), lines_buffer.as_mut_ptr()) };
+    let mut i = num - 1;


What does rb_profile_frames do, and where does the 2048 come from?

rb_profile_frames is the CRuby API for getting a "profiling" stack trace. 2048 came from the original stackprof implementation, but I don't think there's any real reason for it. rb_profile_frames doesn't allocate anything, only write in to a buffer you provide. 2048 seemed to be working well for collecting stats on stackprof, so it seemed natural to use here.

maximecb · 2022-06-08T17:38:03Z

test/ruby/test_yjit_exit_locations.rb

+class TestYJITExitLocations < Test::Unit::TestCase
+  def test_yjit_trace_exits_and_v_no_error
+    _stdout, stderr, _status = EnvUtil.invoke_ruby(%w(-v --yjit-trace-exits), '', true, true)
+    refute_includes(stderr, "NoMethodError")
+  end


Good idea moving this into its own test file 👍

When running with `--yjit-stats` turned on, yjit can inform the user what the most common exits are. While this is useful information it doesn't tell you the source location of the code that exited or what the code that exited looks like. This change intends to fix that. To use the feature, run yjit with the `--yjit-trace-exits` option, which will record the backtrace for every exit that occurs. This functionality requires the stats feature to be turned on. Calling `--yjit-trace-exits` will automatically set the `--yjit-stats` option. Users must call `RubyVM::YJIT.dump_exit_locations(filename)` which will Marshal dump the contents of `RubyVM::YJIT.exit_locations` into a file based on the passed filename. *Example usage:* Given the following script, we write to a file called `concat_array.dump` the results of `RubyVM::YJIT.exit_locations`. ```ruby def concat_array ["t", "r", *x = "u", "e"].join end 1000.times do concat_array end RubyVM::YJIT.dump_exit_locations("concat_array.dump") ``` When we run the file with this branch and the appropriate flags the stacktrace will be recorded. Note Stackprof needs to be installed or you need to point to the library directly. ``` ./ruby --yjit --yjit-call-threshold=1 --yjit-trace-exits -I/Users/eileencodes/open_source/stackprof/lib test.rb ``` We can then read the dump file with Stackprof: ``` ./ruby -I/Users/eileencodes/open_source/stackprof/lib/ /Users/eileencodes/open_source/stackprof/bin/stackprof --text concat_array.dump ``` Results will look similar to the following: ``` ================================== Mode: () Samples: 1817 (0.00% miss rate) GC: 0 (0.00%) ================================== TOTAL (pct) SAMPLES (pct) FRAME 1001 (55.1%) 1001 (55.1%) concatarray 335 (18.4%) 335 (18.4%) invokeblock 178 (9.8%) 178 (9.8%) send 140 (7.7%) 140 (7.7%) opt_getinlinecache ...etc... ``` Simply inspecting the `concatarray` method will give `SOURCE UNAVAILABLE` because the source is insns.def. ``` ./ruby -I/Users/eileencodes/open_source/stackprof/lib/ /Users/eileencodes/open_source/stackprof/bin/stackprof --text concat_array.dump --method concatarray ``` Result: ``` concatarray (nonexistent.def:1) samples: 1001 self (55.1%) / 1001 total (55.1%) callers: 1000 ( 99.9%) Object#concat_array 1 ( 0.1%) Gem.suffixes callees (0 total): code: SOURCE UNAVAILABLE ``` However if we go deeper to the callee we can see the exact source of the `concatarray` exit. ``` ./ruby -I/Users/eileencodes/open_source/stackprof/lib/ /Users/eileencodes/open_source/stackprof/bin/stackprof --text concat_array.dump --method Object#concat_array ``` ``` Object#concat_array (/Users/eileencodes/open_source/rust_ruby/test.rb:1) samples: 0 self (0.0%) / 1000 total (55.0%) callers: 1000 ( 100.0%) block in <main> callees (1000 total): 1000 ( 100.0%) concatarray code: | 1 | def concat_array 1000 (55.0%) | 2 | ["t", "r", *x = "u", "e"].join | 3 | end ``` The `--walk` option is recommended for this feature as it make it easier to traverse the tree of exits. *Goals of this feature:* This feature is meant to give more information when working on YJIT. The idea is that if we know what code is exiting we can decide what areas to prioritize when fixing exits. In some cases this means adding prioritizing avoiding certain exits in yjit. In more complex cases it might mean changing the Ruby code to be more performant when run with yjit. Ultimately the more information we have about what code is exiting AND why, the better we can make yjit. *Known limitations:* * Due to tracing exits, running this on large codebases like Rails can be quite slow. * On complex methods it can still be difficult to pinpoint the exact cause of an exit. * Stackprof is a requirement to to view the backtrace information from the dump file. Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>

eileencodes · 2022-06-09T12:35:58Z

Ok I think this is ready. I did the following (which it looks like you already saw @maximecb)

Updated --yjit-trace-exits to automatically enable --yjit-stats.
Added a RubyVM::YJIT.dump_exit_locations(file_name) for convenience
Moved exit locations from codegen globals into stats and made a new struct. I moved the GC code into here as well so we can call YjitExitLocations.gc_mark_raw_samples()
Updated comments to describe what each part is doing
Moved the tests for this feature into their own file.
Got all tests passing

In addition here is an example of using this feature on Rails.

Rails example

Run a Rails test with yjit and trace exits turned on:

RUBYOPT="--yjit --yjit-trace-exits --yjit-call-threshold=1" bundle exec ruby -I/Users/eileencodes/open_source/stackprof/lib/ -Ilib:test test/cases/transactions_test.rb -n test_rollback_dirty_changes

Stackprof output:

==================================
  Mode: ()
  Samples: 96259 (0.00% miss rate)
  GC: 0 (0.00%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
     43208  (44.9%)       43208  (44.9%)     opt_send_without_block
     12264  (12.7%)       12264  (12.7%)     send
     10913  (11.3%)       10913  (11.3%)     invokeblock
      8469   (8.8%)        8469   (8.8%)     getblockparamproxy
      5671   (5.9%)        5671   (5.9%)     invokesuper
      4435   (4.6%)        4435   (4.6%)     opt_getinlinecache
      3681   (3.8%)        3681   (3.8%)     throw
      2028   (2.1%)        2028   (2.1%)     opt_mod
      1190   (1.2%)        1190   (1.2%)     getinstancevariable
      1119   (1.2%)        1119   (1.2%)     opt_eq
       636   (0.7%)         636   (0.7%)     concatarray
       564   (0.6%)         564   (0.6%)     checkmatch
       447   (0.5%)         447   (0.5%)     setlocal_WC_0
       414   (0.4%)         414   (0.4%)     objtostring
       356   (0.4%)         356   (0.4%)     putspecialobject
       286   (0.3%)         286   (0.3%)     getconstant
...

Let's look at the getconstant exit using the --walk option.

ruby -I/Users/eileencodes/open_source/stackprof/lib/ /Users/eileencodes/open_source/stackprof/bin/stackprof --walk rails_exits.dump --method getconstant

Our first page of output will look like this which shows us that there are exits in each location.:

getconstant (nonexistent.def:1)
  samples:   286 self (0.3%)  /    286 total (0.3%)
  callers:
     238  (   83.2%)  ActiveRecord::ConnectionAdapters::AbstractAdapter#type_map
      26  (    9.1%)  URI::Generic.default_port
      18  (    6.3%)  Bundler::URI::Generic.default_port
       2  (    0.7%)  Bundler::URI::Generic.component
       1  (    0.3%)  ActiveRecord::ConnectionAdapters::AbstractAdapter#adapter_name
       1  (    0.3%)  JSON.generator=
  callees (0 total):
  code:
        SOURCE UNAVAILABLE
trace_getconstant (nonexistent.def:1)
  samples:     0 self (0.0%)  /      0 total (0.0%)
  callees (0 total):
  code:
        SOURCE UNAVAILABLE


Select next method:
 1)  ActiveRecord::ConnectionAdapters::AbstractAdapter#type_map
 2)  URI::Generic.default_port
 3)  Bundler::URI::Generic.default_port
 4)  Bundler::URI::Generic.component
 5)  ActiveRecord::ConnectionAdapters::AbstractAdapter#adapter_name
 6)  JSON.generator=
 7)  exit

If we pick the first one we can see the exact code that's causing this exit to happen in ActiveRecord::ConnectionAdapters::AbstractAdapter#type_map.

ActiveRecord::ConnectionAdapters::AbstractAdapter#type_map (/Users/eileencodes/open_source/real_rails/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb:800)
  samples:     0 self (0.0%)  /    238 total (0.2%)
  callers:
     238  (  100.0%)  ActiveRecord::ConnectionAdapters::Quoting#lookup_cast_type
  callees (238 total):
     238  (  100.0%)  getconstant
  code:
                                  |   800  |         def type_map
                                  |   801  |           if key = extended_type_map_key
                                  |   802  |             self.class::EXTENDED_TYPE_MAPS.compute_if_absent(key) do
                                  |   803  |               self.class.extended_type_map(**key)
                                  |   804  |             end
                                  |   805  |           else
  238    (0.2%)                   |   806  |             self.class::TYPE_MAP
                                  |   807  |           end

maximecb · 2022-06-09T16:59:34Z

Well done Eileen!

Now that this is complete, I think it would make sense to use the new tool that you've built to pinpoint improvements that we could make. Either improve YJIT to better support rails, or change rails to make it more easily optimizable by YJIT. Something to think about for next steps :)

I originally added the check for RubyVM::YJIT.trace_exit_locations_enabled? to fix errors when these tests run without the stats feature enabled. However I forgot that this will never be true when this test is booting, so nothing was running when the stats feature is turned on. Instead I've decided to make a new hash in the dump file and check if exit locations are enabled there. If they aren't enabled we return early to avoid checking for keys that won't exit in the dumped exit locations. I chose to add this additional enabled check because empty exit locations might not indicate that stats isn't enabled, it could mean the feature is entirely broken. I do want these tests to fail if stats are on and nothing was collected. Followup to ruby#5970

I originally added the check for RubyVM::YJIT.trace_exit_locations_enabled? to fix errors when these tests run without the stats feature enabled. However I forgot that this will never be true when this test is booting, so nothing was running when the stats feature is turned on. Instead I've decided to make a new hash in the dump file and check if exit locations are enabled there. If they aren't enabled we return early to avoid checking for keys that won't exit in the dumped exit locations. I chose to add this additional enabled check because empty exit locations might not indicate that stats isn't enabled, it could mean the feature is entirely broken. I do want these tests to fail if stats are on and nothing was collected. Followup to #5970

eileencodes requested review from maximecb, XrXr and tenderlove as code owners June 1, 2022 14:42

eileencodes force-pushed the yjit-stacktrace-rust3 branch 2 times, most recently from 11283ae to b08eab4 Compare June 1, 2022 15:10

maximecb reviewed Jun 1, 2022

View reviewed changes

eileencodes force-pushed the yjit-stacktrace-rust3 branch 3 times, most recently from 7fd1331 to 82be00d Compare June 8, 2022 16:54

maximecb reviewed Jun 8, 2022

View reviewed changes

maximecb approved these changes Jun 8, 2022

View reviewed changes

eileencodes force-pushed the yjit-stacktrace-rust3 branch from 82be00d to 03ff674 Compare June 9, 2022 12:33

maximecb merged commit 473ee32 into ruby:master Jun 9, 2022

eileencodes deleted the yjit-stacktrace-rust3 branch June 9, 2022 17:03

eileencodes mentioned this pull request Jun 9, 2022

Fix exit locations test #5995

Merged

		// Parses the yjit_raw_samples and yjit_line_samples data into three
		// hashes (raw, lines, and frames) to be used by RubyVM::YJIT.exit_locations.

Add ability to trace exit locations in yjit #5970

Add ability to trace exit locations in yjit #5970

Uh oh!

Conversation

eileencodes commented Jun 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maximecb commented Jun 1, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tenderlove Jun 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eileencodes commented Jun 9, 2022

Rails example

Uh oh!

maximecb commented Jun 9, 2022

Uh oh!

Uh oh!

eileencodes commented Jun 1, 2022 •

edited

Loading

tenderlove Jun 6, 2022 •

edited

Loading