-
Notifications
You must be signed in to change notification settings - Fork 5.4k
ObjectSpace.dump_all: dump shapes as well #6868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm using this script to test this feature for now: require 'objspace'
ObjectSpace.trace_object_allocations_start
class Foo
def initialize
@a = 1
@b = 2
@c = 3
end
def foo
@foo ||= 1
end
def bar
@bar ||= 1
end
end
GC.start
gen = GC.count
f1 = Foo.new
f1.foo
f1.bar
f2 = Foo.new
f2.bar
f2.foo
ObjectSpace.dump_all(output: STDOUT, since: gen)
puts '-' * 40
puts "f1: #{ObjectSpace.dump(f1)}"
puts "f2: #{ObjectSpace.dump(f2)}"
s = RubyVM::Shape.of(f1)
p [s, s.id, s.type, s.depth, s.edges] The output looks like this:
@jemmaissroff @tenderlove @k0kubun I'd love to hear your opinion on this. |
2221805
to
072c353
Compare
I ran
|
40edcdd
to
32579f4
Compare
@byroot @jemmaissroff and I were working on the same thing (but you beat us to opening a PR). I think this is a good idea.
I added
We had this considered this too, but since the JSON is basically backwards compatible, I figured it's fine to add to the normal dump. Jemma and I added the shape id for each object in #6864, so I think it makes sense to combine the normal heap dump with shape information.
Seems fine to me. |
Sounds good, then I guess I only need to add an argument for partial dump of shapes, I can do that tomorrow. |
I just realized we shouldn't dump the parent_id on the root shape:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
Thanks for this! Incidentally, we started working on this exact same idea EOD yesterday. To answer your questions:
Agreed.
@tenderlove added it to vm stats in #6798!
We thought about this question yesterday too. It seems like because
If we don't go with the |
Sorry for the duplicated work, I guess we got startled by the same growing shape count report 😄. BTW I think this PR is already in a good enough state to debug that issue @k0kubun, if you wanna try it? |
Ah yes, I'd love to try it. How do you intend to use this on our app? Just sending an entire dump to the log storage on the Rack middleware? |
32579f4
to
64a8ce0
Compare
No worries at all! And I didn't notice Aaron had already commented above too so more duplication 🤦♀️ |
So after experimenting with this, I think one interesting field missing is |
6a0dc17
to
f169ea9
Compare
template/id.h.tmpl
Outdated
@@ -44,6 +44,8 @@ enum ruby_id_types { | |||
#define ID_JUNK RUBY_ID_JUNK | |||
#define ID_INTERNAL RUBY_ID_INTERNAL | |||
|
|||
#define ID_INTERNAL_P(id) ((id & RUBY_ID_SCOPE_MASK) == RUBY_ID_INTERNAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extracted this so at least it feels a bit less dirty to check for internal ids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it already exists in symbol.h
as is_junk_id()
.
f169ea9
to
0159e1e
Compare
Hum, I still have a segfault when printing I don't think I'll have much more time to debug this tonight. |
0159e1e
to
0ca146a
Compare
0ca146a
to
49b634f
Compare
shape.c
Outdated
@@ -196,6 +209,7 @@ rb_shape_transition_shape_frozen(VALUE obj) | |||
rb_shape_t * | |||
rb_shape_get_next_iv_shape(rb_shape_t* shape, ID id) | |||
{ | |||
RUBY_ASSERT(is_junk_id(id) || RTEST(rb_sym2str(ID2SYM(id)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this helped crash when the unexpected IDs are set:
-- C level backtrace information -------------------------------------------
/home/runner/work/ruby/ruby/build/ruby(rb_print_backtrace+0x818) [0x55e04d7b1eb8] ../src/vm_dump.c:770
/home/runner/work/ruby/ruby/build/ruby(rb_vm_bugreport) ../src/vm_dump.c:1065
/home/runner/work/ruby/ruby/src/tool/lib/test/unit/parallel.rb: TestRubyLiteral#test_debug_frozen_string_in_array_literal(rb_assert_failure+0x77) [0x55e04d5bb0b2]
/home/runner/work/ruby/ruby/build/ruby(rb_shape_get_next_iv_shape+0x1f) [0x55e04d5a50e8] ../src/shape.c:212
/home/runner/work/ruby/ruby/build/ruby(rb_multi_ractor_p) ../src/shape.c:210
/home/runner/work/ruby/ruby/build/ruby(rb_vm_lock_enter) ../src/vm_sync.h:74
/home/runner/work/ruby/ruby/build/ruby(get_next_shape_internal) ../src/shape.c:122
/home/runner/work/ruby/ruby/build/ruby(rb_shape_get_next_iv_shape) ../src/shape.c:213
/home/runner/work/ruby/ruby/build/ruby(rb_shape_get_next) ../src/shape.c:219
/home/runner/work/ruby/ruby/build/ruby(generic_ivar_set+0x1ae) [0x55e04d77ab0e] ../src/variable.c:1286
/home/runner/work/ruby/ruby/build/ruby(ivar_set) ../src/variable.c:1547
/home/runner/work/ruby/ruby/build/ruby(rb_ivar_set) ../src/variable.c:1556
/home/runner/work/ruby/ruby/build/ruby(compile_array+0x531) [0x55e04d90e101]
/home/runner/work/ruby/ruby/build/ruby(iseq_compile_each0+0x261e) [0x55e04d9099ae]
The ID
in question seem to be id_debug_created_info
, which I can't really figure out what type it is.
I could use some help, but I think RUBY_ID_INTERNAL
is far from covering all ids we shouldn't try to map to a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id_debug_created_info
looks like one of those special ids that is < tLAST_OP_ID
, if that helps at all. is_notop_id
checks for those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just some value in an enum. There is no associated string for that particular ID. Not all IDs have string representations, so we need to dump something else as the edge name.
It's defined in id.h
. Unfortunately that file is generated, but you can find the source for it here.
When we derive the key name, we should probably look at the shape type. Only IVAR shapes will have an edge name that may have a string representation, but it's not guaranteed. As @composerinteralia, I think is_notop_id
is the right check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We looked through id.def
and id.c.tmpl
and it looks like id_debug_created_info
is the only id that works like this 😓. It's defined in id.def
to be -
, which carries the special meaning that it has no defined string which looks like it was done to hide them from marshal, which skips them by just checking if there is a string associated.
The easiest path forward is probably to duplicate that check and skip any id where !rb_id2str(id);
In the future we might consider having a different way for marshal to skip undesirable ivars (I'd be interested in looking into this, but not so close to release 😅).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if we wanted to be conservative we could print out only ids where is_instance_id(x)
(only the ones which start with an @
), we might miss some internal things, but that should be really safe to do (and since this is for debugging we may not need it perfect)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I tried a few more things, but ultimately ended up using is_instance_id
like you suggested. Thanks for the help!
60b750f
to
436e025
Compare
I see several arguments in doing so. First they use a non trivial amount of memory, so for various memory profiling/mapping tools it is relevant to have visibility of the space occupied by shapes. Then, some pathological code can create a tons of shape, so it is valuable to have a way to have a way to observe shapes without having to compile Ruby with `SHAPE_DEBUG=1`. And additionally it's likely much faster to dump then this way than to use `RubyVM::Shape`. There are however a few open questions: - Shapes can't respect the `since:` argument. Not sure what to do when it is provided. Would probably make sense to not dump them. - Maybe it would make more sense to have a separate `ObjectSpace.dump_shapes`? - Maybe instead `dump_all` should take a `shapes: false` argument? Additionally, `ObjectSpace.dump_shapes` is added for the use case of debugging the evolution of the shape tree.
436e025
to
a986aa5
Compare
I think this is good to go now. Anyone wants to review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ LGTM, thanks for writing! (I don't have approve permissions)
Tested and working in our app 🚀 |
I see several arguments in doing so.
First they use a non trivial amount of memory, so for various memory profiling/mapping tools it is relevant to have visibility of the space occupied by shapes.
Then, some pathological code can create a tons of shape, so it is valuable have a way to observe shapes without having to compile Ruby with
SHAPE_DEBUG=1
.And additionally it's likely much faster to dump them this way than to use
RubyVM::Shape
.There are however a few open questions:
since:
argument. Not sure what to do when it is provided. Would probably make sense to not dump them.RubyVM::Shape.next_shape_id
was available on all builds, we could use that as a cursor for differential dumps.ObjectSpace.dump_shapes
?dump_all
should take ashapes: false
argument?Additionally,
ObjectSpace.dump_shapes
is added for the use case of debugging the evolution of the shape tree.