Skip to content

Expose need_major_gc via GC.latest_gc_info #6791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 10, 2022

Conversation

mirek26
Copy link
Contributor

@mirek26 mirek26 commented Nov 22, 2022

We trigger GC out-of-band to reduce latency impact on synchronous request processing. It’s not perfect, but it works. One thing we really want to avoid is major GC during request processing - so we don’t want to start request processing in a state where need_major_gc is set. If there is a risk of major GC, we prefer to run it out-of-band, where it doesn't matter for request latency.

Unfortunately, this state is currently not exposed - so we can’t know whether the next GC is set to be major or not. As a workaround, we trigger minor GC twice in a row - if a major GC threshold is hit during the first GC, the second GC is upgraded to major and there is no immediate risk of another major GC. This mostly works, but it's not great and it's not efficient (the second GC is not cheap and it's unnecessary in most cases); this PR makes it possible to do that in a better way by exposing need_major_gc.

It may seem strange to expose it under GC.latest_gc_info but I think it’s a reasonable place because the decision was made during the latest GC.

@mirek26
Copy link
Contributor Author

mirek26 commented Dec 9, 2022

@tenderlove @peterzhu2118 - I see you've done GC-related work recently. Would you be able to review this, provide feedback, or route to someone else who could take a look? We've been running this at Stripe for a while and would love to get it upstream so that we can remove our custom patch in the future

Copy link
Member

@peterzhu2118 peterzhu2118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. Can you add a test for this?

@@ -11020,6 +11022,19 @@ gc_info_decode(rb_objspace_t *objspace, const VALUE hash_or_key, const unsigned
Qnil;
SET(major_by, major_by);

if (orig_flags == 0) { /* set need_major_by only if flags not set explicitly */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since need_major_flags is only needed inside the if block, we can move it in.

Suggested change
if (orig_flags == 0) { /* set need_major_by only if flags not set explicitly */
if (orig_flags == 0) { /* set need_major_by only if flags not set explicitly */
unsigned int need_major_flags = objspace->rgengc.need_major_gc;

@mirek26 mirek26 force-pushed the mirek-need-major-gc branch from 10d3ebe to 38f1b7d Compare December 10, 2022 01:42
@mirek26
Copy link
Contributor Author

mirek26 commented Dec 10, 2022

@peterzhu2118 thanks for looking!

Moved need_major_flags and added a test. Getting to a state where need_major_gc is set a bit tricky, so the test is basically "allocate objects and verify need_major_gc is set before major gc happens", which should be pretty robust. Let me know if you have some other ideas

Copy link
Member

@peterzhu2118 peterzhu2118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, thank you for your contribution!

@peterzhu2118 peterzhu2118 merged commit ea613c6 into ruby:master Dec 10, 2022
@ko1
Copy link
Contributor

ko1 commented Dec 22, 2022

Sorry I missed it.

I feel strange because major_by and need_major_by lives in same data and it is confusing for me.
major_by is information about the last GC and need_major_by is for the next GC.

I don't against this feature, but I think it was better that the name implies next GC's information.

@mirek26
Copy link
Contributor Author

mirek26 commented Dec 22, 2022

@ko1 I agree it's a bit strange - I commented on it in the description:

It may seem strange to expose it under GC.latest_gc_info but I think it’s a reasonable place because the decision was made during the latest GC.

I don't feel strongly about the name - happy to change it if you have a better suggestion! I like need_major_by because it's consistent with internal naming

@ko1
Copy link
Contributor

ko1 commented Dec 23, 2022

On the other hand I don't think it is strange in latest_gc_info because as you wrote need_major_by is calc by latest GC's result.

I like need_major_by because it's consistent with internal naming

I agree about it. But internal naming can be changed easily, but exposed naming is difficult to change (for example, if we will change it on Ruby 3.3, your tool depend on it will be broken). This is why we are discussing naming long time.

Anyway, it is too late so I left it. Next time, please ask me or please file it on bugs.ruby-lang.org and Matz will decide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants