Skip to content

Add ability to trace exit locations in yjit #5970

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 9, 2022

Conversation

eileencodes
Copy link
Member

@eileencodes eileencodes commented Jun 1, 2022

When running with --yjit-stats turned on Ruby code can inform the user
what the most common exits are. While this is useful information it
doesn't tell you the source location of the code that exited or what the
code that exited looks like. This change intends to fix that.

To use the feature, run yjit with --yjit-stats and --yjit-trace-exits,
which will record the backtrace for every exit that occurs. Users must save
the output of RubyVM::YJIT.exit_locations to a dump file. That file
can then be read by StackProf to see the code that exited and the
reason.

Example usage:

Given the following script, we write to a file called
concat_array.dump the results of RubyVM::YJIT.exit_locations.

def concat_array
  ["t", "r", *x = "u", "e"].join
end

1000.times do
  concat_array
end

File.write("concat_array.dump", Marshal.dump(RubyVM::YJIT.exit_locations))

When we run the file with this branch and the appropriate flags the
stacktrace will be recorded. Note Stackprof needs to be installed or you
need to point to the library directly.

./ruby --yjit --yjit-call-threshold=1 --yjit-stats --yjit-trace-exits -I/Users/eileencodes/open_source/stackprof/lib test.rb

We can then read the dump file with Stackprof:

./ruby -I/Users/eileencodes/open_source/stackprof/lib/ /Users/eileencodes/open_source/stackprof/bin/stackprof --text concat_array.dump

Results will look similar to the following:

==================================
  Mode: ()
  Samples: 1817 (0.00% miss rate)
  GC: 0 (0.00%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
      1001  (55.1%)        1001  (55.1%)     concatarray
       335  (18.4%)         335  (18.4%)     invokeblock
       178   (9.8%)         178   (9.8%)     send
       140   (7.7%)         140   (7.7%)     opt_getinlinecache
       ...etc...

Simply inspecting the concatarray method will give SOURCE UNAVAILABLE because the source is insns.def.

./ruby -I/Users/eileencodes/open_source/stackprof/lib/ /Users/eileencodes/open_source/stackprof/bin/stackprof --text concat_array.dump --method concatarray

Result:

concatarray (nonexistent.def:1)
  samples:  1001 self (55.1%)  /   1001 total (55.1%)
  callers:
    1000  (   99.9%)  Object#concat_array
       1  (    0.1%)  Gem.suffixes
  callees (0 total):
  code:
        SOURCE UNAVAILABLE

However if we go deeper to the callee we can see the exact
source of the concatarray exit.

./ruby -I/Users/eileencodes/open_source/stackprof/lib/ /Users/eileencodes/open_source/stackprof/bin/stackprof --text concat_array.dump --method Object#concat_array
Object#concat_array (/Users/eileencodes/open_source/rust_ruby/test.rb:1)
  samples:     0 self (0.0%)  /   1000 total (55.0%)
  callers:
    1000  (  100.0%)  block in <main>
  callees (1000 total):
    1000  (  100.0%)  concatarray
  code:
                                  |     1  | def concat_array
 1000   (55.0%)                   |     2  |   ["t", "r", *x = "u", "e"].join
                                  |     3  | end

The --walk option is recommended for this feature as it make it
easier to traverse the tree of exits.

Goals of this feature:

This feature is meant to give more information when working on YJIT.
The idea is that if we know what code is exiting we can decide what
areas to prioritize when fixing exits. In some cases this means adding
prioritizing avoiding certain exits in yjit. In more complex cases it
might mean changing the Ruby code to be more performant when run with
yjit. Ultimately the more information we have about what code is exiting
AND why, the better we can make yjit.

Known limitations:

  • Due to tracing exits, running this on large codebases like Rails
    can be quite slow.
  • On complex methods it can still be difficult to pinpoint the exact cause of
    an exit.
  • Stackprof is a requirement to to view the backtrace information from
    the dump file

Co-authored-by: Aaron Patterson tenderlove@ruby-lang.org

Ruby Issue: https://bugs.ruby-lang.org/issues/18812

@eileencodes eileencodes force-pushed the yjit-stacktrace-rust3 branch 2 times, most recently from 11283ae to b08eab4 Compare June 1, 2022 15:10
@maximecb
Copy link
Contributor

maximecb commented Jun 1, 2022

Some comments based on the description:

To use the feature, run yjit with --yjit-stats and --yjit-trace-exits

Maybe --yjit-trace-exits should automatically enable --yjit-stats.

File.write("concat_array.dump", Marshal.dump(RubyVM::YJIT.exit_locations))

Could make sense to add a RubyVM::YJIT.dump_exit_locations(file_name) for convenience?

I wish you'd posted the output on railsbench or a larger example so we can see the full potential.

Comment on lines 88 to 87
def test_compile_putobject
assert_compiles('true', insns: %i[putobject], result: true)
assert_compiles('123', insns: %i[putobject], result: 123)
assert_compiles(':foo', insns: %i[putobject], result: :foo)
assert_compiles('true', insns: %i[putobject], result: true, trace_exits: true)
assert_compiles('123', insns: %i[putobject], result: 123, trace_exits: true)
assert_compiles(':foo', insns: %i[putobject], result: :foo, trace_exits: true)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confused as to why you're setting trace_exits: true for some tests but not others. What are you testing? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the tests create so much data that tracing exits slowed them down, so instead I chose a bunch that I know produced exits and tested that the hash has the keys I expect (and aren't empty).

yjit.c Outdated
Comment on lines 119 to 120
// Parses the yjit_raw_samples and yjit_line_samples data into three
// hashes (raw, lines, and frames) to be used by RubyVM::YJIT.exit_locations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are raw samples and line samples?

// If --yjit-trace-exits option is enabled, record the exit stack
// while recording the side exits.
if get_option!(gen_trace_exits) {
mov(cb, RDI, const_ptr_opnd(exit_pc as *const u8));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use C_ARG_REGS[0] instead of directly using RDI

Comment on lines 5965 to 5967
/// For implementing exit location tracking
yjit_raw_samples: Vec<VALUE>,
yjit_line_samples: Vec<i32>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state for this should be stored in stats.rs, not in codegen.rs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a struct just for these values, but @noahgibbs recommended moving it into CodegenGlobals but I don't remember the exact reason why. Unless I totally misunderstood what he meant when we paired on this, @noahgibbs do you remember?

Previously the way I was accessing my struct (needing an instance of it) required me to wrap this in unsafe, but that might have been my implementation. I don't know enough Rust to be confident in that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we talked about moving it into CodegenGlobals to avoid having the additional unsafe dance for a new structure. It sounds like @maximecb is recommending an additional unsafe dance for a new structure -- I don't think she wants it in the stats counters, so it would need its own singleton setup and accessors (including unsafe.)

If it was an additional structure in stats.rb, not in counters, you would indeed need unsafe for it. It's basically a singleton or global at that point, and Rust demands unsafe for situations with possible data races like that :-/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrapped the counter increments in a macro to avoid having to expose the internals of how its accessed.

It just doesn't semantically belong in CodegenGlobals 😅

Comment on lines 221 to 235
while idx < yjit_raw_samples.len() as size_t {
let num = yjit_raw_samples[idx as usize];
let mut i = 0;

idx += 1;

while i < i32::from(num) {
unsafe { rb_gc_mark(yjit_raw_samples[idx as usize]); }
i += 1;
idx += 1;
}

idx += 1;
idx += 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs commenting.

What are raw samples? What kind of values do they contain that needs to be GC'd?

Why does idx +=1 appear twice in a row?

Copy link
Member

@tenderlove tenderlove Jun 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maximecb Each frame is wrapped by a GC allocated object that is returned via rb_profile_frames.

@eileencodes correct me if I'm wrong, but IIRC the layout of raw_samples looks something like this:

[ len, frame1, frame2, frameN, exit_insn, count, len, frame1, frame2, frameN, exit_insn, count, ...          ]
  ^------------- First Sample ----------------^  ^--------------- Second Sample -------------^  ^--- etc ---^

The array isn't a homogenous list of VALUE pointers, it also contains the exit instruction (which isn't a VALUE) and also book keeping (the number of times we've seen that particular stack in a row and the actual length of the sample).

The two idx bumps skip those two "non GC allocated" objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should explain that quickly in the comments, maybe mention which source file rb_profile_frames is defined in.

Comment on lines 319 to 323
const BUFF_LEN: usize = 2048;
let mut frames_buffer = [VALUE(0 as usize); BUFF_LEN];
let mut lines_buffer = [0; BUFF_LEN];
let num = unsafe { rb_profile_frames(0, BUFF_LEN as i32, frames_buffer.as_mut_ptr(), lines_buffer.as_mut_ptr()) };
let mut i = num - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does rb_profile_frames do, and where does the 2048 come from?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rb_profile_frames is the CRuby API for getting a "profiling" stack trace. 2048 came from the original stackprof implementation, but I don't think there's any real reason for it. rb_profile_frames doesn't allocate anything, only write in to a buffer you provide. 2048 seemed to be working well for collecting stats on stackprof, so it seemed natural to use here.

@eileencodes eileencodes force-pushed the yjit-stacktrace-rust3 branch 3 times, most recently from 7fd1331 to 82be00d Compare June 8, 2022 16:54
Comment on lines +15 to +19
class TestYJITExitLocations < Test::Unit::TestCase
def test_yjit_trace_exits_and_v_no_error
_stdout, stderr, _status = EnvUtil.invoke_ruby(%w(-v --yjit-trace-exits), '', true, true)
refute_includes(stderr, "NoMethodError")
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea moving this into its own test file 👍

When running with `--yjit-stats` turned on, yjit can inform the user
what the most common exits are. While this is useful information it
doesn't tell you the source location of the code that exited or what the
code that exited looks like. This change intends to fix that.

To use the feature, run yjit with the `--yjit-trace-exits` option,
which will record the backtrace for every exit that occurs. This functionality
requires the stats feature to be turned on. Calling `--yjit-trace-exits`
will automatically set the `--yjit-stats` option.

Users must call `RubyVM::YJIT.dump_exit_locations(filename)` which will
Marshal dump the contents of `RubyVM::YJIT.exit_locations` into a file
based on the passed filename.

*Example usage:*

Given the following script, we write to a file called
`concat_array.dump` the results of `RubyVM::YJIT.exit_locations`.

```ruby
def concat_array
  ["t", "r", *x = "u", "e"].join
end

1000.times do
  concat_array
end

RubyVM::YJIT.dump_exit_locations("concat_array.dump")
```

When we run the file with this branch and the appropriate flags the
stacktrace will be recorded. Note Stackprof needs to be installed or you
need to point to the library directly.

```
./ruby --yjit --yjit-call-threshold=1 --yjit-trace-exits -I/Users/eileencodes/open_source/stackprof/lib test.rb
```

We can then read the dump file with Stackprof:

```
./ruby -I/Users/eileencodes/open_source/stackprof/lib/ /Users/eileencodes/open_source/stackprof/bin/stackprof --text concat_array.dump
```

Results will look similar to the following:

```
==================================
  Mode: ()
  Samples: 1817 (0.00% miss rate)
  GC: 0 (0.00%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
      1001  (55.1%)        1001  (55.1%)     concatarray
       335  (18.4%)         335  (18.4%)     invokeblock
       178   (9.8%)         178   (9.8%)     send
       140   (7.7%)         140   (7.7%)     opt_getinlinecache
       ...etc...
```

Simply inspecting the `concatarray` method will give `SOURCE
UNAVAILABLE` because the source is insns.def.

```
./ruby -I/Users/eileencodes/open_source/stackprof/lib/ /Users/eileencodes/open_source/stackprof/bin/stackprof --text concat_array.dump --method concatarray
```

Result:

```
concatarray (nonexistent.def:1)
  samples:  1001 self (55.1%)  /   1001 total (55.1%)
  callers:
    1000  (   99.9%)  Object#concat_array
       1  (    0.1%)  Gem.suffixes
  callees (0 total):
  code:
        SOURCE UNAVAILABLE
```

However if we go deeper to the callee we can see the exact
source of the `concatarray` exit.

```
./ruby -I/Users/eileencodes/open_source/stackprof/lib/ /Users/eileencodes/open_source/stackprof/bin/stackprof --text concat_array.dump --method Object#concat_array
```

```
Object#concat_array (/Users/eileencodes/open_source/rust_ruby/test.rb:1)
  samples:     0 self (0.0%)  /   1000 total (55.0%)
  callers:
    1000  (  100.0%)  block in <main>
  callees (1000 total):
    1000  (  100.0%)  concatarray
  code:
                                  |     1  | def concat_array
 1000   (55.0%)                   |     2  |   ["t", "r", *x = "u", "e"].join
                                  |     3  | end
```

The `--walk` option is recommended for this feature as it make it
easier to traverse the tree of exits.

*Goals of this feature:*

This feature is meant to give more information when working on YJIT.
The idea is that if we know what code is exiting we can decide what
areas to prioritize when fixing exits. In some cases this means adding
prioritizing avoiding certain exits in yjit. In more complex cases it
might mean changing the Ruby code to be more performant when run with
yjit. Ultimately the more information we have about what code is exiting
AND why, the better we can make yjit.

*Known limitations:*

* Due to tracing exits, running this on large codebases like Rails
can be quite slow.
* On complex methods it can still be difficult to pinpoint the exact cause of
an exit.
* Stackprof is a requirement to to view the backtrace information from
the dump file.

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
@eileencodes eileencodes force-pushed the yjit-stacktrace-rust3 branch from 82be00d to 03ff674 Compare June 9, 2022 12:33
@eileencodes
Copy link
Member Author

Ok I think this is ready. I did the following (which it looks like you already saw @maximecb)

  • Updated --yjit-trace-exits to automatically enable --yjit-stats.
  • Added a RubyVM::YJIT.dump_exit_locations(file_name) for convenience
  • Moved exit locations from codegen globals into stats and made a new struct. I moved the GC code into here as well so we can call YjitExitLocations.gc_mark_raw_samples()
  • Updated comments to describe what each part is doing
  • Moved the tests for this feature into their own file.
  • Got all tests passing

In addition here is an example of using this feature on Rails.

Rails example

Run a Rails test with yjit and trace exits turned on:

RUBYOPT="--yjit --yjit-trace-exits --yjit-call-threshold=1" bundle exec ruby -I/Users/eileencodes/open_source/stackprof/lib/ -Ilib:test test/cases/transactions_test.rb -n test_rollback_dirty_changes

Stackprof output:

==================================
  Mode: ()
  Samples: 96259 (0.00% miss rate)
  GC: 0 (0.00%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
     43208  (44.9%)       43208  (44.9%)     opt_send_without_block
     12264  (12.7%)       12264  (12.7%)     send
     10913  (11.3%)       10913  (11.3%)     invokeblock
      8469   (8.8%)        8469   (8.8%)     getblockparamproxy
      5671   (5.9%)        5671   (5.9%)     invokesuper
      4435   (4.6%)        4435   (4.6%)     opt_getinlinecache
      3681   (3.8%)        3681   (3.8%)     throw
      2028   (2.1%)        2028   (2.1%)     opt_mod
      1190   (1.2%)        1190   (1.2%)     getinstancevariable
      1119   (1.2%)        1119   (1.2%)     opt_eq
       636   (0.7%)         636   (0.7%)     concatarray
       564   (0.6%)         564   (0.6%)     checkmatch
       447   (0.5%)         447   (0.5%)     setlocal_WC_0
       414   (0.4%)         414   (0.4%)     objtostring
       356   (0.4%)         356   (0.4%)     putspecialobject
       286   (0.3%)         286   (0.3%)     getconstant
...

Let's look at the getconstant exit using the --walk option.

ruby -I/Users/eileencodes/open_source/stackprof/lib/ /Users/eileencodes/open_source/stackprof/bin/stackprof --walk rails_exits.dump --method getconstant

Our first page of output will look like this which shows us that there are exits in each location.:

getconstant (nonexistent.def:1)
  samples:   286 self (0.3%)  /    286 total (0.3%)
  callers:
     238  (   83.2%)  ActiveRecord::ConnectionAdapters::AbstractAdapter#type_map
      26  (    9.1%)  URI::Generic.default_port
      18  (    6.3%)  Bundler::URI::Generic.default_port
       2  (    0.7%)  Bundler::URI::Generic.component
       1  (    0.3%)  ActiveRecord::ConnectionAdapters::AbstractAdapter#adapter_name
       1  (    0.3%)  JSON.generator=
  callees (0 total):
  code:
        SOURCE UNAVAILABLE
trace_getconstant (nonexistent.def:1)
  samples:     0 self (0.0%)  /      0 total (0.0%)
  callees (0 total):
  code:
        SOURCE UNAVAILABLE


Select next method:
 1)  ActiveRecord::ConnectionAdapters::AbstractAdapter#type_map
 2)  URI::Generic.default_port
 3)  Bundler::URI::Generic.default_port
 4)  Bundler::URI::Generic.component
 5)  ActiveRecord::ConnectionAdapters::AbstractAdapter#adapter_name
 6)  JSON.generator=
 7)  exit

If we pick the first one we can see the exact code that's causing this exit to happen in ActiveRecord::ConnectionAdapters::AbstractAdapter#type_map.

ActiveRecord::ConnectionAdapters::AbstractAdapter#type_map (/Users/eileencodes/open_source/real_rails/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb:800)
  samples:     0 self (0.0%)  /    238 total (0.2%)
  callers:
     238  (  100.0%)  ActiveRecord::ConnectionAdapters::Quoting#lookup_cast_type
  callees (238 total):
     238  (  100.0%)  getconstant
  code:
                                  |   800  |         def type_map
                                  |   801  |           if key = extended_type_map_key
                                  |   802  |             self.class::EXTENDED_TYPE_MAPS.compute_if_absent(key) do
                                  |   803  |               self.class.extended_type_map(**key)
                                  |   804  |             end
                                  |   805  |           else
  238    (0.2%)                   |   806  |             self.class::TYPE_MAP
                                  |   807  |           end

@maximecb
Copy link
Contributor

maximecb commented Jun 9, 2022

Well done Eileen!

Now that this is complete, I think it would make sense to use the new tool that you've built to pinpoint improvements that we could make. Either improve YJIT to better support rails, or change rails to make it more easily optimizable by YJIT. Something to think about for next steps :)

@maximecb maximecb merged commit 473ee32 into ruby:master Jun 9, 2022
@eileencodes eileencodes deleted the yjit-stacktrace-rust3 branch June 9, 2022 17:03
eileencodes added a commit to eileencodes/ruby that referenced this pull request Jun 9, 2022
I originally added the check for
RubyVM::YJIT.trace_exit_locations_enabled? to fix errors when these
tests run without the stats feature enabled. However I forgot that this
will never be true when this test is booting, so nothing was running
when the stats feature is turned on.

Instead I've decided to make a new hash in the dump file and check if
exit locations are enabled there. If they aren't enabled we return early
to avoid checking for keys that won't exit in the dumped exit locations.
I chose to add this additional enabled check because empty exit
locations might not indicate that stats isn't enabled, it could mean the
feature is entirely broken. I do want these tests to fail if stats are
on and nothing was collected.

Followup to ruby#5970
maximecb pushed a commit that referenced this pull request Jun 9, 2022
I originally added the check for
RubyVM::YJIT.trace_exit_locations_enabled? to fix errors when these
tests run without the stats feature enabled. However I forgot that this
will never be true when this test is booting, so nothing was running
when the stats feature is turned on.

Instead I've decided to make a new hash in the dump file and check if
exit locations are enabled there. If they aren't enabled we return early
to avoid checking for keys that won't exit in the dumped exit locations.
I chose to add this additional enabled check because empty exit
locations might not indicate that stats isn't enabled, it could mean the
feature is entirely broken. I do want these tests to fail if stats are
on and nothing was collected.

Followup to #5970
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants