-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Tail call VM [2] #18720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Tail call VM [2] #18720
Conversation
Zend/zend_compile.h
Outdated
@@ -135,7 +136,7 @@ void zend_const_expr_to_zval(zval *result, zend_ast **ast_ptr, bool allow_dynami | |||
typedef int (*user_opcode_handler_t) (zend_execute_data *execute_data); | |||
|
|||
struct _zend_op { | |||
const void *handler; | |||
zend_vm_opcode_handler_t handler; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typed handler pointers in a few places to prevent confusion between orig handler and call handlers (zend_vm_opcode_handler_t / zend_vm_opcode_handler_func_t).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZEND_OPCODE_HANDLER_RET ZEND_FASTCALL zend_jit_func_trace_helper(ZEND_OPCODE_HANDLER_ARGS) | ||
ZEND_OPCODE_HANDLER_RET ZEND_OPCODE_HANDLER_CCONV zend_jit_func_trace_helper(ZEND_OPCODE_HANDLER_ARGS) | ||
{ | ||
ZEND_OPCODE_TAIL_CALL_EX(zend_jit_trace_counter_helper, | ||
((ZEND_JIT_COUNTER_INIT + JIT_G(hot_func) - 1) / JIT_G(hot_func))); | ||
zend_jit_op_array_trace_extension *jit_extension = | ||
(zend_jit_op_array_trace_extension*)ZEND_FUNC_INFO(&EX(func)->op_array); | ||
size_t offset = jit_extension->offset; | ||
uint32_t cost = ((ZEND_JIT_COUNTER_INIT + JIT_G(hot_func) - 1) / JIT_G(hot_func)); | ||
|
||
*(ZEND_OP_TRACE_INFO(opline, offset)->counter) -= cost; | ||
|
||
ZEND_OPCODE_TAIL_CALL(zend_jit_trace_counter_helper); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tail-calling was not possible due to the extra arg in zend_jit_trace_counter_helper
, so I removed it.
Alternative would be to define these handlers in IR, as we do for the hybrid JIT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand why extra_arg became a problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang ˋmusttailˋ disallows calling functions with a different signature. Presumably this is to garantee portability. This is discussed here: https://blog.reverberate.org/2025/02/10/tail-call-updates.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea makes sense and the benchmark results look promising.
It's also interesting how it affects the VM code size.
Unfortunately at this moment I can't review a huge path like this in all details.
Anyway, I think it makes sense to finalize and land it.
out($f,"typedef struct _zend_vm_trampoline {\n"); | ||
out($f," const zend_op *opline;\n"); | ||
out($f," zend_vm_opcode_handler_t handler;\n"); | ||
out($f,"} zend_vm_trampoline;\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C standard doesn't require returning content of struct in a pair of registers.
x86_64 ABI does this, but all other ABIs have to be checked (x86, Windows64, ...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed.
SysV amd64 and aarch64 return the struct via registers, but SysV i386 and MS ABI do not.
Tail calling is effectively only enabled on clang, so this should not be an issue on Windows, at least currently.
It is an issue on x86 however.
I've made the following changes:
- Added
__attribute__((sysv_abi))
on handlers returningzend_vm_trampoline
, when compiling with Clang on Windows, so the struct is returned via registers. - Enable tail calling only on x86_64 and arm64 (Edit:
preserve_none
is not supported on x86 anyway)
ext/opcache/jit/zend_jit_ir.c
Outdated
* https://github.com/llvm/llvm-project/blob/a414877a7a5f000d01370acb1162eb1dea87f48c/llvm/lib/Target/X86/X86RegisterInfo.cpp#L319 | ||
* https://github.com/llvm/llvm-project/blob/68bfe91b5a34f80dbcc4f0a7fa5d7aa1cdf959c2/llvm/lib/Target/X86/X86CallingConv.td#L1183 | ||
*/ | ||
jit->ctx.fixed_regset |= (1<<5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed_regs
are the registers reserved for the register variables.
I remember there were some problems with RBP usage, but I don't remember the exact problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I add the register only to prevent the RA from using it. This is less expensive than adding it to the set of preserved registers. The same thing is done for the HYBRID VM a few lines below.
ZEND_OPCODE_HANDLER_RET ZEND_FASTCALL zend_jit_func_trace_helper(ZEND_OPCODE_HANDLER_ARGS) | ||
ZEND_OPCODE_HANDLER_RET ZEND_OPCODE_HANDLER_CCONV zend_jit_func_trace_helper(ZEND_OPCODE_HANDLER_ARGS) | ||
{ | ||
ZEND_OPCODE_TAIL_CALL_EX(zend_jit_trace_counter_helper, | ||
((ZEND_JIT_COUNTER_INIT + JIT_G(hot_func) - 1) / JIT_G(hot_func))); | ||
zend_jit_op_array_trace_extension *jit_extension = | ||
(zend_jit_op_array_trace_extension*)ZEND_FUNC_INFO(&EX(func)->op_array); | ||
size_t offset = jit_extension->offset; | ||
uint32_t cost = ((ZEND_JIT_COUNTER_INIT + JIT_G(hot_func) - 1) / JIT_G(hot_func)); | ||
|
||
*(ZEND_OP_TRACE_INFO(opline, offset)->counter) -= cost; | ||
|
||
ZEND_OPCODE_TAIL_CALL(zend_jit_trace_counter_helper); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand why extra_arg became a problem
c8cb9d5
to
e334ae4
Compare
@dstogov I've addressed all issues, and all necessary IR changes have been merged, so this branch is now ready to be reviewed again, and hopefully merged :) Since your last review I've made the following notable change: As adding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my questions...
uintptr_t __attribute__((preserve_none)) test(void) { | ||
uintptr_t ret; | ||
#if defined(__x86_64__) | ||
__asm__ __volatile__( | ||
/* XORing to make it unlikely the value exists in any other register */ | ||
"movq %1, %%r12\n" | ||
"xorq %3, %%r12\n" | ||
"movq %2, %%r13\n" | ||
"xorq %3, %%r13\n" | ||
"xorq %%rax, %%rax\n" | ||
"call fun\n" | ||
: "=a" (ret) | ||
: "r" (const1), "r" (const2), "r" (key) | ||
: "r12", "r13" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need these asm tests?
preserve_none
is an unstable CLANG extension, but can't we just test CLANG version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests ensure that the tailcall VM will be enabled in all future versions of Clang as long as these expectations remain valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test will just hide the problem in case of possible incompatibility with future versions.
I wouldn't care about compatibility with future versions. They always may break something.
#if ZEND_VM_KIND == ZEND_VM_KIND_CALL | ||
return op->handler; | ||
#elif ZEND_VM_KIND == ZEND_VM_KIND_HYBRID | ||
#if ZEND_VM_KIND == ZEND_VM_KIND_HYBRID || ZEND_VM_TAIL_CALL_DISPATCH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you use a separate define ZEND_VM_TAIL_CALL_DISPATCH
?
May be it's better to use ZEND_VM_KIND_TAIL_CALL
?
This would look more consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I will check
#if __has_attribute(sysv_abi) | ||
# define HAVE_SYSV_ABI | ||
# define ZEND_SYSV_ABI __attribute__((sysv_abi)) | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that we have sysv_abi
attribute, doesn't mean that SYSV_ABI is the default calling convention.
According to CLANG manual it's available on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to CLANG manual it's available on Windows.
Yes that's the point :) This is related to #18720 (comment), but this is not absolutely required as we don't support building with Clang on Windows yet, so I can remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem. I saw that you use it especially for Windows.
// Helper with parameter. Must use trampoline dispatch due to | ||
// incompatible signature for tailcall. | ||
out($f, "#undef ZEND_VM_DISPATCH\n"); | ||
out($f, "#define ZEND_VM_DISPATCH(handler) ZEND_VM_DISPATCH_NOTAIL(handler)\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeatable redefinition of the same macro looks very weird.
Can't we redefine it once - between generating code for TAIL_CALL and regular CALL handlers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this before VM helpers that take extra arguments, because __attribute__((musttail))
doesn't allow them to tailcall to helpers or handlers with a different number of args.
Currently, VM helpers with extra arguments are interleaved with normal VM helpers and opcode handlers.
Avoiding the repeated redefinition would require to group VM helpers with extra args together, however this may change code locality.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that all TAIL_CALL handlers and helpers should be generated after/before all CALL handlers and helpers.
@@ -1746,7 +1774,8 @@ function gen_executor_code($f, $spec, $kind, $prolog, &$switch_labels = array()) | |||
// Generate handler for undefined opcodes | |||
switch ($kind) { | |||
case ZEND_VM_KIND_CALL: | |||
gen_null_handler($f); | |||
gen_null_handler($f, $variant); | |||
gen_halt_handler($f, $variant); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose halt_handler is necessary for tail_call variant only.
# define IR_OPCODE_HANDLER_RET IR_ADDR | ||
#endif | ||
|
||
#define IR_OPCODE_HANDLER_FUNC IR_FASTCALL_FUNC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this new name?
# define ir_CONST_OPCODE_HANDLER_FUNC(_addr) \ | ||
jit_CONST_OPCODE_HANDLER_FUNC(jit, _addr) | ||
# define ir_CAST_OPCODE_HANDLER_FUNC(_addr) ir_fold2(_ir_CTX, IR_OPT(IR_PROTO, IR_ADDR), (_addr), \ | ||
ir_proto_0(_ir_CTX, IR_OPCODE_HANDLER_FUNC, IR_I32)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about IR_I32
.
jit->ctx.fixed_regset |= (1<<IR_REG_FP) | (1<<IR_REG_LR); | ||
#else | ||
ZEND_UNREACHABLE(); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jit->ctx.fixed_stack_frame_size
assigned above may be reduced as we don't need space to save/restore any register now.
# define ZREG_FP 12 /* IR_REG_R12 */ | ||
# define ZREG_IP 13 /* IR_REG_R13 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These registers are not marked as "fixed" any more, so IR may reuse them for something else.
At least it should be used for passing arguments to helper functions.
On the other hand, they are still used in RLOAD/RSTORE instructions.
I'm surprised how this can work. What do I miss?
opline->opcode == ZEND_GENERATOR_CREATE) { | ||
opline->opcode == ZEND_GENERATOR_CREATE || | ||
opline->opcode == ZEND_INCLUDE_OR_EVAL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an independent fix?
Related:
This part takes tail-calling and
preserve_none
from #17849:preserve_none
reduces register saving overhead in opcode handlersThis also implements JIT support.
Non-dispatching opcode handlers
JIT needs non-dispatching opcode handlers (opcode handlers that return instead of calling the next one). I've tried two approaches for this:
call_handler
zend_op->handler
is a function that calls the real handler and dispatches.I've tried both approach (the first one in this branch, and the second one in master...arnaud-lb:php-src:hybrid-tailcall.
The second approach resulted in a slightly slower VM due to indirect dispatching, and JIT generated more spilling when calling handlers as they clobber all registers.
Therefore I've taken the first approach in this PR.
A 3rd approach would be to control dispatching via an additional handler parameter, or to pass a dispatch function to handlers, but I suspect this would have been slower.
Fixed regs and preserved regs
Thanks to the
preserve_none
convention, JIT'ed code only has to preserverbp
, which reduces the size of prologue/epilogue. Instead of preserving it, I add it to the set of fixed registers, so it's not used. This results in faster code.Also, quite conveniently,
preserve_none
receives its first arguments via registers that are callee-saved in sysv. Therefore we can use the arg1 and arg2 regs as our fixed registers. This avoids moving arg1 and arg2 to SP/IP in prologue, or setting arg1/arg2 when tail-calling other handlers.Benchmarks:
base: Clang build of master, wall time
gcc: GCC build of master, wall time
valgrind: Clang build of master, valgrind instructions
Conclusion: Clang builds are now as fast GCC builds on the Symfony Demo benchmark in both JIT and non-JIT modes.
Issues
preserve_none
calling convention is documented as unstable. JIT would break if it changed. I suggest checking this at build time, and disabling this optimization (tailcalling + preserve_none) ifpreserve_none
changed. Edit: added a configure check.preserve_none
and ASAN, therefore we disable this if ASAN is enabled. Edit: This seems to be fixed in recent Clang versions.UPGRADING
TODO