-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Tail call VM #17849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Tail call VM #17849
Conversation
@@ -313,6 +313,18 @@ char *alloca(); | |||
# define ZEND_FASTCALL | |||
#endif | |||
|
|||
#if __has_attribute(preserve_none) && !defined(__SANITIZE_ADDRESS__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an incompatibility between preserve_none
and ASAN, which crashes Clang. I will report the issue.
@@ -8212,9 +8212,9 @@ ZEND_VM_HANDLER(150, ZEND_USER_OPCODE, ANY, ANY) | |||
case ZEND_USER_OPCODE_LEAVE: | |||
ZEND_VM_LEAVE(); | |||
case ZEND_USER_OPCODE_DISPATCH: | |||
ZEND_VM_DISPATCH(opline->opcode, opline); | |||
ZEND_VM_DISPATCH_OPCODE(opline->opcode, opline); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed this rarely used macro so I could re-use its name
@@ -1,3 +1,5 @@ | |||
#include "Zend/zend_vm_opcodes.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes language servers / IDEs happy when viewing zend_vm_execute.h
$str .= "#include <main/php_config.h>\n"; | ||
$str .= "#include \"Zend/zend_portability.h\"\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes language servers / IDEs happy when viewing zend_vm_opcodes.h
Interesting work! I suppose this will require special support for JIT. |
Yes this does require some changes to the JIT to accommodate for the new opcode handler signature and how FP/IP are passed around. I plan to implement them unless there are major issues with the current approach. The fact that
The second one seems reasonable to me. |
@@ -21,6 +21,9 @@ | |||
#ifndef ZEND_VM_OPCODES_H | |||
#define ZEND_VM_OPCODES_H | |||
|
|||
#include <main/php_config.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to avoid this dependency on main?
HYBRID VM generates two handlers for each opcode (C function with standard ABI + non standard GOTO). JIT uses one or the other when suitable. Technically, tail call does the same GOTO, so the same approach might work. CLANG doesn't support global register variables. LLVM may achieve similar thing, using custom calling convention that pin arguments to registers (this technique used for Haskel, Erlang, HHVM ...). Unfortunately, I didn't found a way to introduce new calling convention without LLVM patching (cool OOP style). Using them in CLANG was also problematic. It was long time ago and may be something is changed. |
BTW LLVM/CLANG should support local register variables. So maybe GOTO and HYBRID VMs may be adopted. |
This changes the signature of opcode handlers in the CALL VM so that the opline is passed directly via arguments. This reduces the number of memory operations on EX(opline), and makes the CALL VM considerably faster. Additionally, this unifies the CALL and HYBRID VMs a bit, as EX(opline) is now handled in the same way in both VMs. This is a part of GH-17849. Currently we have two VMs: * HYBRID: Used when compiling with GCC. execute_data and opline are global register variables * CALL: Used when compiling with something else. execute_data is passed as opcode handler arg, but opline is passed via execute_data->opline (EX(opline)). The Call VM looks like this: while (1) { ret = execute_data->opline->handler(execute_data); if (UNEXPECTED(ret != 0)) { if (ret > 0) { // returned by ZEND_VM_ENTER() / ZEND_VM_LEAVE() execute_data = EG(current_execute_data); } else { // returned by ZEND_VM_RETURN() return; } } } // example op handler int ZEND_INIT_FCALL_SPEC_CONST_HANDLER(zend_execute_data *execute_data) { // load opline const zend_op *opline = execute_data->opline; // instruction execution // dispatch // ZEND_VM_NEXT_OPCODE(): execute_data->opline++; return 0; // ZEND_VM_CONTINUE() } Opcode handlers return a positive value to signal that the loop must load a new execute_data from EG(current_execute_data), typically when entering or leaving a function. Here I make the following changes: * Pass opline as opcode handler argument * Return next opline from opcode handlers * ZEND_VM_ENTER / ZEND_VM_LEAVE return opline|(1<<0) to signal that execute_data must be reloaded from EG(current_execute_data) This gives us: while (1) { opline = opline->handler(execute_data, opline); if (UNEXPECTED((uintptr_t) opline & ZEND_VM_ENTER_BIT) { opline = opline & ~ZEND_VM_ENTER_BIT; if (opline != 0) { // ZEND_VM_ENTER() / ZEND_VM_LEAVE() execute_data = EG(current_execute_data); } else { // ZEND_VM_RETURN() return; } } } // example op handler const zend_op * ZEND_INIT_FCALL_SPEC_CONST_HANDLER(zend_execute_data *execute_data, const zend_op *opline) { // opline already loaded // instruction execution // dispatch // ZEND_VM_NEXT_OPCODE(): return ++opline; } bench.php is 23% faster on Linux / x86_64, 18% faster on MacOS / M1. Symfony Demo is 2.8% faster. When using the HYBRID VM, JIT'ed code stores execute_data/opline in two fixed callee-saved registers and rarely touches EX(opline), just like the VM. Since the registers are callee-saved, the JIT'ed code doesn't have to save them before calling other functions, and can assume they always contain execute_data/opline. The code also avoids saving/restoring them in prologue/epilogue, as execute_ex takes care of that (JIT'ed code is called exclusively from there). The CALL VM can now use a fixed register for execute_data/opline as well, but we can't rely on execute_ex to save the registers for us as it may use these registers itself. So we have to save/restore the two registers in JIT'ed code prologue/epilogue. Closes GH-17952
This implements the technique described in https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html, which addresses the issues described in http://lua-users.org/lists/lua-l/2011-02/msg00742.html. Python recently implemented this, which resulted in a 9-15% performance improvements: https://blog.reverberate.org/2025/02/10/tail-call-updates.html.
It turns out that @dstogov already addressed these by using a different technique, enabled when compiling with GCC, so this will not improve performances with this compiler, but it makes PHP on Clang as fast as on GCC.
Benchmarks
Zend/bench.php
:PHP/Clang was 77% slower in this benchmark, now only 1% slower.
Symfony Demo
:PHP/Clang was 5% slower in this benchmark.
Current interpreter
The interpreter is generated by
Zend/zend_vm_gen.php
. Multiple modes are supported, but the default (and only supported mode) is the hybrid one, which generates both a call-based interpreter and a GCC-specific interpreter. Which one is actually compiled depends on the compiler being used.In the call-based interpreter, op code handlers are separate functions, the next
opline
to execute is stored inexecute_data
, andexecute_data
is passed as argument to op handlers:Handlers typically load
execute_data->opline
, execute the operation, updateexecute_data->opline
, and return.There is quite a lot of overhead: The
call
instruction pushes a return address on the stack, the function saves/spills registers, etc. E.g. the code ofZEND_INIT_FCALL_SPEC_CONST_HANDLER()
starts withAlso,
opline
needs to be loaded/stored from/to memory.The GCC interpreter manages to eliminate the overhead.
opline->handler
is a computed-goto target, which calls the actual handler. Hot handlers are inlined, FP/IP (execute_data
/opline
) are register variables, handlers take no arguments and have no return value:Changes
Here I had a variation of the call-based interpreter, enabled when using clang-19:
execute_data
andopline
are passed as op handler arguments, so they are always in registers unless they are spilled on the stackpreserve_none
calling convention: reduces register save/spills.The
musttail
attribute is used to force tail calling.Unfortunately
musttail
rejects calls to function whose signature is not compatible with the caller, so it's not possible to tail call VM helpers that have extra parameters. Instead, we use a trampoline when calling these: The helper returns astruct{opline,handler}
(in two registers) which is then tail called by the caller. Since helpers always return (unless they call other helpers), the stack doesn't grow indefinitely:I introduce a
ZEND_VM_DISPATCH()
macro that is used byZEND_VM_NEXT_OPCODE()
and related macros. This macro tail calls the next opline by default. In VM helpers with extra parameters,ZEND_DISPATCH()
is redefined to return the trampoline value instead:Caveats
__attribute__((preserve_none))
is not stable, so we might not use it in exported functions. This has implications for JIT and user opcode handlers. We might need to generate wrappers with a stable convention.opline
as argument and__attribute__((preserve_none))
) to reduce the differences between the call-based interpreter and the clang one.TODO
opline
as argument, without other changes. Maybe do that by default? (Pass opline as argument to opcode handlers in CALL VM #17952)__attribute__((preserve_none))
, without other changesFuture scope:
preserve_none
/preserve_most
/ slow pathsPRs
I'm splitting this into smaller PRs: