-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Interrupt while internal frame is on the stack #14627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Marking it ready for review for GitHub to send notifications to reviewers, who I need feedback from, even though it's not ready for merge. |
No. VM interrupts were designed to allow safe exit (or execution transfer) after some asynchronous event without repeatable checks after each VM instruction. Before the transfer, it's possible to capture the current execution state (save execut_data and opline) and then transfer execution back. This way we implement PHP callbacks for posix signal.
Observer callback already may be called through ZEND_OBSERVER_FCALL_END. Why don't you check interrupt there. |
@dstogov Thank you for your feedback. However, I don't think you understood my point about If we have a function like: ZEND_FUNCTION(example) {
// ...
zend_call_function(/* ... */);
// ...
} And the interrupt handler is called during this function, which is during a
I don't think so. It removes the CHECK_SYMBOL_TABLES()
OPLINE = opline + 1; Where is the other interrupt check?
I would make the same changes to
Yes, but installing an observer for every internal function fcall is expensive. My whole goal is to enable performance optimizations in the engine by avoiding |
In reality it's not "middle of ICALL handler" but end of function handler. The VM stack is in a safe state here. The signals now would simply be on top of the internal function, without any state being actually invalid. Signal callbacks are not "capture the current execution state (save execut_data and opline) and then transfer execution back", but simply a frame pushed on top of the current VM stack. This wouldn't be safe anyway, unless you also back up the C stack, which fibers do, which includes the cleanup of the ICALL handler. In fact, I also consider this sort-of helpful for stacktraces in signal handlers - these never include the internal function where the signal occurred, but only the line after. This would make it much more correct, especially for internal functions taking a bit of time. |
@morrisonlevi you are right about the double check. (I was wrong). Anyway, the actual call to
With this patch, the change of @morrisonlevi I didn't understand what fiber related problem you are trying to solve. |
@dstogov Thanks for your reply. Consider code with this pattern: ZEND_FUNCTION(/* ... */) {
// ...
zend_call_function(/* ... */);
// ...
} Remember that for ~4 years
If the So what I am saying, is that my patch doesn't change anything meaningful for extensions which want to jump to other places in the VM. They would need to do more work than just saving and swapping VM state, they also need to save C stack and registers, and that's already the case because of I cannot write a test for you, because I don't have an extension which does what you are saying. But if you were to run it while calling such functions listed above, you should be able to see what I'm talking about. For instance, make a large array and do |
I think I started to understand you (sorry, this took a while). I see, commit 555489d added interrupt check into Your PR doesn't fix anything, but just allows to set an "observing interrupt handler" that may see the interrupted internal function and its arguments. Right? |
Yes, that's correct. This would be a nice improvement for my work. I will work on adding support to the other fcall handlers and adding an entry to UPGRADING.INTERNALS. |
Do your profiler generates VM interrupts itself? (I mean these are not regular POSIX signal or timeout). If this is the case, I would try not to use VM interrupts, but read the PHP backtrace info at first place. Accepting this PR (when complete) is not a big problem. It should be mentioned that interrupt handlers are not allowed to switch |
We generate VM interrupts from another thread, and atomically set the VM interrupt flags of active PHP threads.
There's not a way to do this safely from the other thread. If we switch to a signal which is delivered to the PHP thread and walk during the signal handler (rather than a VM interrupt), we still have to deal with those consequences:
It seems best to only do this in VM interrupt handlers.
I'll have you review the note, thank you. |
12b9b76
to
5468218
Compare
Zend/zend_vm_def.h
Outdated
@@ -4160,7 +4178,9 @@ ZEND_VM_C_LABEL(fcall_by_name_end): | |||
zend_rethrow_exception(execute_data); | |||
HANDLE_EXCEPTION(); | |||
} | |||
ZEND_VM_SET_OPCODE(opline + 1); | |||
// todo: under what circumstances do ZEND_USER_FUNCTIONs take this code path? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ZEND_DO_FCALL_BY_NAME
, I think this isn't ever taken by user functions. All the expansions of ZEND_VM_ENTER_EX
seem to return except for when HYBRID_NEXT
is used, where it does a jump. I'm not sure where the jump goes. Is this dead code for user code? @dstogov
Edit: I updated the VM to only take this code and no tests broke broke in Zend
. This strengthens my belief it's dead code but the VM gen and related stuff is quite clever sometimes, hence the ask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code should be never executed for ZEND_USER_FUNCTION, but this might be changed by deprecated attribute RFC (see #11293)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: The #[\Deprecated]
attribute is now merged.
Zend/zend_vm_def.h
Outdated
@@ -4271,7 +4299,9 @@ ZEND_VM_C_LABEL(fcall_end): | |||
HANDLE_EXCEPTION(); | |||
} | |||
|
|||
ZEND_VM_SET_OPCODE(opline + 1); | |||
// todo: under what circumstances do ZEND_USER_FUNCTIONs take this code path? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ZEND_DO_FCALL
, this seems to get taken when zend_execute_ex
is set to something other than execute_ex
. Otherwise, it seems the same as my other todo (ZEND_VM_ENTER_EX
gets expanded into a return or jmp, so this code is never reached).
If that assumption about ZEND_VM_ENTER_EX
is correct, then I suppose the fix is to do the interrupt at the end of the else branch of if (EXPECTED(zend_execute_ex == execute_ex))
.
If the assumption is wrong, then I need guidance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you need to check interrupt only for internal functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that for the case of a user function which takes the zend_execute_ex
path, we don't need an interrupt check at all in this opcode handler? If so, why is that?
This way you may handle one event for a loop iteration or internal function call, but this is up to you...
Previously, on POSIX we bailed out right from an async timeout signal and on Windows we set flag and checked it after VM instruction. VM interrupts unified implementation and minimized overhead of checks by doing them only at jumps and calls. |
b444369
to
5e379ad
Compare
I accidentally removed some VM checks in my latest change in jump opcodes. I'll fix it soon. |
UPGRADING.INTERNALS
Outdated
* DO_ICALL, DO_FCALL, and DO_FCALL_BY_NAME now call zend_interrupt_function | ||
while the internal frame is still on the stack. This means interrupt handlers | ||
will now see the internal call. If your interrupt handler does something like | ||
switching EG(current_execute_data), it probably needs to do something more | ||
similar to a fiber switch, or at least not do so when internal frames are on | ||
top anymore. | ||
|
||
* RETURN, RETURN_BY_REF, GENERATOR_RETURN, and CALL_TRAMPOLINE now do a check | ||
for a VM interrupt and call zend_timeout/zend_interrupt_function after the | ||
ZEND_OBSERVER_FCALL_END call. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dstogov Can you check the wording here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove "it probably needs to do something more similar to a fiber switch" suggestion and keep only the internal frame related restriction.
I didn't agree to add interrupt checks at user function RETURN.
Zend/zend.h
Outdated
/* Called by the VM in certain places like at the end of function calls or | ||
* jumps, if EG(vm_interrupt) has been set. | ||
* | ||
* If this is used to switch the EG(current_execute_data), such as implementing | ||
* a coroutine scheduler, then it needs to check the top frame to see if it's | ||
* an internal function. If so, it needs to do something similar to a fiber | ||
* switch, or at the very least delay switching until the internal frame is no | ||
* longer on top. Prior to PHP 8.0, this check was not necessary. In PHP 8.0 | ||
* zend_call_function started calling zend_interrupt_function, and in 8.4 the | ||
* DO_*CALL* opcodes started calling the zend_interrupt_function while the | ||
* internal frame is still on top. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dstogov And also check the wording here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zend_interrupt_function
may be called by VM if EG(vm_interrupt)
flag was asynchronously set. VM checks for EG(vm_interrupts)
at PHP loop header (to break long-running loops), at PHP user function entry (to break recursion), and after internal function calls (that may take long time).
I would also remove the fiber related suggestion. At least I don't know how to do this...
Zend/zend_vm_def.h
Outdated
@@ -4437,6 +4443,7 @@ ZEND_VM_INLINE_HANDLER(62, ZEND_RETURN, CONST|TMP|VAR|CV, ANY, SPEC(OBSERVER)) | |||
} | |||
ZEND_OBSERVER_SAVE_OPLINE(); | |||
ZEND_OBSERVER_FCALL_END(execute_data, return_value); | |||
ZEND_SIMPLE_INTERRUPT_CHECK(execute_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No... Why did you introduce this check? Just for a yet another observer feature?
It's going to make slowdown. :(
JIT should consistently reflect all these VM changes as well. :(
Instead you might check EG(vm_interrupt)
directly in zend_observer_fcall_end()
, but it's inlined anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed better to have consistent rules on when interrupts are triggered, and also made the code simpler. I will pursue other options, but know that certain code will definitely get uglier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My last comment wasn't precise enough. I meant if we keep interrupts in roughly all the same places as on master (except for the change of making the internal one happen while the internal frame is still on the stack), then it will be uglier. If we're fine dropping some cases, then it's not necessarily worse. For instance, on master we can call the interrupt handler if zend_execute_ex
has been set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interrupts are introduced to perform a "serialized" break of interpretation of a long running script with minimal number of checks. We check interrupts only in 3 places - loops, recursive calls, returns from internal functions. This handles possible "infinity" loops and long running internal functions.
There is no reason to check interrupts in RETURN, because long-running user functions should be already caught by loop/call checks.
7ab12bf
to
c5d3a9c
Compare
As per @dstogov's feedback, the interrupt checks on returns are removed. This moves the interrupt checks for internal functions for Interrupt checks have not been added for cases like Edit: I still need to update JIT. Forgot about that momentarily. But could you please review everything other than that? |
The VM part of the PR looks fine for me now. |
Co-authored-by: Bob Weinand <bobwei9@hotmail.com>
@dstogov I updated the JIT with Bob's help. Does it look okay to you? |
if (!zend_jit_check_timeout(jit, opline + 1, exit_addr)) { | ||
return 0; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I see yet another reason why interrupts were designed and implemented that way.
We used just two assembler instruction to check for interrupts and in case of interrupt we switched to interpreter.
cmp EG(vm_interrupts), $0
jne exit_N
Your patch requires 4 instructions (2 additional instruction for each internal function call).
cmp EG(vm_interrupts), $0
je .L1
movabs %rax, $zend_interrupt_or_timeout)
call %rax
.L1
Of course this will affect JIT code size and performance :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the surrounding code look like? I might have expected this to use jne
to jump to some other place which then jumps back to .L1
, since this is cold (and I believe we marked it as such in the JIT IR). This way, the CPU impact is isolated to when the branch is taken (not often). Something like:
cmp EG(vm_interrupts), $0
jne .L2
.L1
; ...
.L2
movabs %rax, $zend_interrupt_or_timeout)
call %rax
jmp .L1
For the happy paths without interrupts, this would be the same performance (minus possible cache effects from sightly increased code size which I think is unlikely) as master. But maybe there's a lot of asm in ; ...
and it puts the cold code pretty far away. I would personally still be okay with penalizing the interrupt path, but I can see why the optimizer might not make that choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, even with the proposed optimisation this makes the code worse (at least bigger).
I don't like to do this just for a new feature of third-party project.
This will affect every PHP installation (without your profiler).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use an exit when zend_interrupt_function
is null (so the generated code is not changed), and a call to zend_interrupt_or_timeout
otherwise. This way there is no regression when zend_interrupt_function
is null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good idea. I think many users will have the pcntl
extension, though, which sets pcntl_interrupt_function
. Are we okay with special casing pcntl_interrupt_function
? The code would need some reworking but logically the check is pcntl_loaded && zend_interrupt_function == pcntl_interrupt_function && orig_interrupt_function == NULL
then it's the same as !zend_interrupt_function
.
I've pushed code that does the basic specialization (nothing special for ext/pcntl).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add a flag to tell the JIT to not exit to VM on interrupts. Then we use the new code when the flag is set, and the old code otherwise. PCNTL would not set the flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dstogov Can I get your opinion on the current patch which implements basic zend_interrupt_function
specialization in JIT?
Zend/zend_execute.h
Outdated
*/ | ||
ZEND_API ZEND_COLD void ZEND_FASTCALL zend_interrupt_or_timeout(zend_execute_data *call); | ||
|
||
static zend_always_inline void zend_interrupt_or_timeout_check(zend_execute_data *call) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This function is similar to ZEND_VM_INTERRUPT_CHECK()
and ZEND_VM_LOOP_INTERRUPT_CHECK()
, but with subtle differences.
I think it would help understanding if this function was explicitly implemented as a variant of these two macros. E.g. it could be implemented as a macro named ZEND_VM_FCALL_INTERRUPT_CHECK()
.
Zend/zend_execute.h
Outdated
/* Call this to handle the timeout or the interrupt function. It will not clear | ||
* the EG(vm_interrupt). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you rely on zend_interrupt_function
to clear vm_interrupt
in this case? Should we still reset vm_interrupt
in case of timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed up a commit while you were reviewing ^_^
The store is done in the VM with atomic exchange which is better than a load + store individually. However in JIT, I don't have an IR instruction for atomic exchange, so there it does a load + store.
if (!zend_jit_check_timeout(jit, opline + 1, exit_addr)) { | ||
return 0; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use an exit when zend_interrupt_function
is null (so the generated code is not changed), and a call to zend_interrupt_or_timeout
otherwise. This way there is no regression when zend_interrupt_function
is null.
I'd like to merge this. I'm not sure how Dmitry was getting asm of the JIT, and I'm on aarch64 anyway, so I can't check that I've actually fixed the asm when no interrupt handler is set. Is this acceptable to everyone? Anyone want to check anything else? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can dump the generated ASM with -d opcache.jit_debug=1
(ZEND_JIT_DEBUG_ASM
) (this requires building with --with-capstone
). I've checked on x86_64, and I confirm that the code generated by this branch is the same as in master when zend_interrupt_function
is NULL.
This looks acceptable to me apart from a nit pick and a question, but I would wait for Dmitry's review.
It is difficult to determine performance of atomics sometimes. In this case, the separate load+store is still correct, and a load does not cause a modification, and might be faster for some platforms than an exchange. A load+store is slower than an exchange, but we're fine trading the penalty to the slow path and keeping the happy path faster.
6327dc8
to
8fa433a
Compare
@dstogov Thanks to Arnaud's help, the PR is improved. Could you please review this again? |
Merging it after talking to Arnauld given that Dmitry stayed silent and RC1 is nearing. |
Final Changes
DO_ICALL
,DO_FCALL
, andDO_FCALL_BY_NAME
have been changed tocheck for VM interrupts while the internal frame is still on top of the
call stack.
Technically, this removes some interrupt checks as well, but only for
fringe cases such as when
zend_execute_ex
is not null.Adds docs for
zend_interrupt_function
, noting the restrictions onswitching
EG(current_execute_data)
when internal frames are on top.Updates UPGRADING.INTERNALS.
Background and Original Description
If the
EG(vm_interrupt)
flag is set, the goal is forzend_interrupt_function
to be called while the internal function call frame is still on the stack. The reason that profilers and other tools which set VM interrupts will never get that leaf frame included in the backtrace, which is often important. For instance, regular expression, database, and file I/O functions tend to be internal functions which consume time (whether wall- or cpu-time) and are therefore relevant to a profiler.The obvious work-arounds are undesirable:
zend_execute_internal
to check it manually while the frame is on the stack will disable certain optimizations, and in PHP 8.4 this becomes worse with frameless calls. This is partly why I'm visiting this now. This is what I do today.It seems to me that triggering the interrupt while the call is still on the frame makes the most sense. Once upon a time, I tried to make this change but I was unable to find the PR and/or discussion around it. I know that someone brought up that it was designed this way for an interrupting scheduler to be able to switch the VM away to another place but:
zend_call_function
has handled interrupts for 4 years. I'm sure such a tool would also have an issue with that case if it has problems with this one.So I'm looking for feedback on my patch:
ZEND_DO_ICALL
. I am waiting for feedback before working on the other call opcodes.And also more generally: