Skip to content

ZJIT: Fix ObjToString rewrite #14196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 13, 2025
Merged

Conversation

st0012
Copy link
Member

@st0012 st0012 commented Aug 12, 2025

Currently, the rewrite for ObjToString always replaces it with a SendWithoutBlock(to_s) instruction when the receiver is not a string literal. This is incorrect because it calls to_s on the receiver even if it's already a string.

This change fixes it by:

  • Avoiding the SendWithoutBlock(to_s) rewrite
  • Implement codegen for ObjToString

@matzbot matzbot requested a review from a team August 12, 2025 20:52
@st0012 st0012 linked an issue Aug 12, 2025 that may be closed by this pull request
// Need to replicate what CALL_SIMPLE_METHOD does
asm_comment!(asm, "side-exit if rb_vm_objtostring returns Qundef");
asm.cmp(ret, Qundef.into());
asm.je(side_exit(jit, state, ObjToStringFallback)?);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to implement the conditional "call to_s on val when we got Qundef" without generating a lot of assembly (e.g. calling gen_send_without_block and je to it). So I decided to side exit in that case for now. This will at least fix the bug we're seeing
Thoughts?

Copy link
Member

@k0kubun k0kubun Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can profile the receiver object, guard on the type (and side-exit), and generate only what's supposed to be executed. This is what YJIT does.


So I decided to side exit in that case for now.

I also agree it's fine for now, but if you're interested in fixing it in this PR, I would suggest:

Profile the receiver type. Generate one of the following HIRs:

  1. For T_STRING: GuardType(String)
  2. (Optional: For T_SYMBOL/T_MODULE/T_CLASS/T_NIL/T_TRUE/T_FALSE/T_FIXNUM: GuardType(type)ObjToString that side-exits on Qundef like the current PR)
  3. For others: GuardTypeNot(String) (new HIR) → SendWithoutBlock

Behavior-wise, (1) + (3) should fix any issues. (2) is more of an optional optimization that you can leverage your current patch for. GuardTypeNot(String) shouldn't be too hard to implement if you handle only T_STRING (and bail out otherwise) for now.

Copy link
Member

@XrXr XrXr Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of hiding a send in a non-send HIR instruction either due to optimizer visibility etc. One way around this is to have insn::ObjToString only call rb_vm_objtostring() and build the if-else logic into HIR during parsing. This way we have a regular send in the else BB, and the optimizer should still be able to handle folding away string input situations.

To go faster, see Kokubun's suggestion on how to speculate.

It still sounds quite involved to me though. I think side exiting for now is fine and we can make an issue and optimize this later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestions. I created Shopify#701 as a follow up.

For now I'll keep side exit so we can avoid the incorrect behaviour.

This comment has been minimized.

@k0kubun
Copy link
Member

k0kubun commented Aug 12, 2025

Could we just guard that the receiver is not T_STRING and leave a method call handling in the JIT code? If the fallback case is a method call, it's easy to understand that you can optimize it like a regular method call, but once you delegate the method call to a C function, you cannot easily touch it. When we add a specialized codegen for to_s C functions, we'd like to optimize it using them.

That's the bugfix, but ideally you should also profile the receiver object and generate different HIR (guard that it's T_STRING and do nothing else) to avoid exiting because "the receiver is not T_STRING".

I updated the comment at #14196 (comment) and merged my suggestions there.

@k0kubun
Copy link
Member

k0kubun commented Aug 12, 2025

If it's too complicated to figure out the right HIR for it now, I guess it's fine to start with this and work on it in a separate PR.

Could you file another issue to optimize ObjToString further as a reminder? We could probably optimize away gen_prepare_non_leaf_call if we guard on types, and avoid side-exiting on types that are not handled in vm_objtostring.

@@ -439,6 +439,24 @@ fn gen_get_ep(asm: &mut Assembler, level: u32) -> Opnd {
ep_opnd
}

fn gen_objtostring(jit: &mut JITState, asm: &mut Assembler, val: Opnd, cd: *const rb_call_data, state: &FrameState) -> Option<Opnd> {
gen_prepare_non_leaf_call(jit, asm, state)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't figure out which part of rb_vm_objtostring calls a method. Could you point out which part of it is not leaf?

Copy link
Member

@XrXr XrXr Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When recv.singleton_class? rb_mod_to_s() calls can call inspect on the attached object. Diabolical.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I realized there's rb_inspect in the non-rb_any_to_s branch, which does rb_funcall. So yeah, rb_vm_objtostring is not leaf.

It doesn't matter at this point, but I didn't get what's not leaf about rb_any_to_s. Does rb_sprintf("#<%"PRIsVALUE":%p>", cname, (void*)obj) dispatch a method through PRIsVALUE? (I don't know which code formats "%"PRIsVALUE)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in general PRIsVALUE can call to_s by going through PRI_EXTRA_MARK and ruby__sfvextra(). Only for non-T_STRINGs does ruby__sfvextra() make a call, though, and it looks like cname is always a string here. Though there is also an encoding check in there so the print might rise.

Currently, the rewrite for `ObjToString` always replaces it with a
`SendWithoutBlock(to_s)` instruction when the receiver is not a
string literal. This is incorrect because it calls `to_s` on the
receiver even if it's already a string.

This change fixes it by:
- Avoiding the `SendWithoutBlock(to_s)` rewrite
- Implement codegen for `ObjToString`
@st0012 st0012 force-pushed the zjit-fix-objtostring branch from 9492f5f to bed681e Compare August 13, 2025 18:56
@k0kubun k0kubun merged commit 2b16f27 into ruby:master Aug 13, 2025
84 of 86 checks passed
@k0kubun k0kubun deleted the zjit-fix-objtostring branch August 13, 2025 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ZJIT: Fix optimization for Insn::ObjToString
4 participants