Skip to content

PHP-FPM segfault due to after free usage of child->ev_std(out|err) #10461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
SandakovMM opened this issue Jan 27, 2023 · 13 comments · Fixed by ThePHPF/thephp.foundation#90
Closed
Assignees

Comments

@SandakovMM
Copy link

SandakovMM commented Jan 27, 2023

Description

Hello
I have encountered an issue on multiple web-hosting servers after updating to PHP 8.1.14. The PHP-FPM master process is crashing periodically, approximately once a day. This issue does not occur with PHP 8.0.
A GDB backtrace shows that the crash is happening in the fpm_event_epoll_wait function:

(gdb) bt
--
#0 0x0000010c00000000 in ?? ()
#1 0x0000556f30ed496e in fpm_event_epoll_wait (queue=<optimized out>, timeout=<optimized out>) at sapi/fpm/fpm/events/epoll.c:141
#2 0x0000556f30ec6201 in fpm_event_loop (err=err@entry=0) at sapi/fpm/fpm/fpm_events.c:427
#3 0x0000556f30ec0167 in fpm_run (max_requests=0x7ffccc6fa19c) at sapi/fpm/fpm/fpm.c:113
#4 0x0000556f30b8430c in main (argc=2, argv=0x7ffccc6fa6d8) at sapi/fpm/fpm/fpm_main.c:1828

Unfortunately, the steps to reproduce the issue are unclear. I can see only warnings about the child limit reached in the error.log. But the warning is common and happens from time to time even when the problem doesn't happen, so I assume they are not connected to the issue.
I have reviewed the changes between PHP 8.1.13 and 8.1.14 and suspect that the issue may be caused by this commit. I have reverted this commit and the issue did not occur over the past week. However, I have been unable to determine the specific problem with these changes.

Could you please help investigate the issue?

PHP Version

PHP 8.1.14

Operating System

Ubuntu 20, Ubuntu 22, Centos 7, AlmaLinux 8

@bukka
Copy link
Member

bukka commented Feb 5, 2023

@SandakovMM This is interesting as the backtrace points to the actual even firing so it's like it crashes during the callback call which doesn't make much sense to me. It might be potentially some memory corruption. The original commit introduced a bit of casting so maybe that could be the source of the issue. To verify that I prepared a patch in #10510 . Would you be able to cherry pick its only commit, apply it to 8.1.14 and check if you see any crashes after that?

If it doesn't help could you share your FPM config and could you also enable a debug log level and share its content around the time of the future crash?

The commit that is causing your commit is actually a fix for #8517 so we don't want to just revert it. I'd prefer try to fix it though.

@bukka
Copy link
Member

bukka commented Feb 5, 2023

@Poulpatine @triplesixman As you have been testing the commit causing this crash for longer than others, I was wondering if you noticed any further crash and if you are also getting from time to time child limit reached warning (just checking if there can be some relation between the crash and warning potentially)?

@SandakovMM
Copy link
Author

SandakovMM commented Feb 5, 2023

Would you be able to cherry pick its only commit, apply it to 8.1.14 and check if you see any crashes after that?

Sure, I will do my best to l check it in two weeks. Thank you

@SandakovMM
Copy link
Author

Hey,
I am sorry, but unfortunately, we have no resources to check this patch right now. We will do it as soon as possible.

@bukka
Copy link
Member

bukka commented Mar 17, 2023

@SandakovMM Hi, just wanted to check if you would be able to share your FPM config and a bit more info about your app and traffic that you are getting? I'm asking because we did not have any other report about this so there might be something specific to your usage.

@SandakovMM
Copy link
Author

Hay,
I should add some context. We provide our product to hosting companies. Some of them meet the problem and asked for our help. I've checked their FPM configures and could not find anything suspicious that would be complete for all of them. So, unfortunately, I have nothing to share with you. At least for now, sorry.

@SandakovMM
Copy link
Author

Good news, we are ready to check if the patch helps. I will keep you updated with the results within the next 2-3 weeks.

@github-actions
Copy link

github-actions bot commented Apr 5, 2023

No feedback was provided. The issue is being suspended because we assume that you are no longer experiencing the problem. If this is not the case and you are able to provide the information that was requested earlier, please do so. Thank you.

@github-actions github-actions bot closed this as completed Apr 5, 2023
@iluuu1994 iluuu1994 reopened this Apr 5, 2023
@bukka bukka changed the title PHP-FPM seagfault after update from PHP 8.1.13 to PHP 8.1.14 PHP-FPM seagfault due to child->ev_std(out|err) being used after the child is freed Apr 10, 2023
@bukka bukka changed the title PHP-FPM seagfault due to child->ev_std(out|err) being used after the child is freed PHP-FPM segfault due to after free usage of child->ev_std(out|err) Apr 10, 2023
@bukka
Copy link
Member

bukka commented Apr 10, 2023

@SandakovMM thanks for trying to test the patch but unfortunately it is not going to help.

I just figured out what the actual problem is. It is basically due to use of even that is part of the child in

fpm_event_set(&child->ev_stdout, child->fd_stdout, FPM_EV_READ, fpm_stdio_child_said, (void *) (intptr_t) child->pid);
fpm_event_add(&child->ev_stdout, 0);
fpm_event_set(&child->ev_stderr, child->fd_stderr, FPM_EV_READ, fpm_stdio_child_said, (void *) (intptr_t) child->pid);
fpm_event_add(&child->ev_stderr, 0);
.

The problem is that when child gets freed before the event, the event is also freed so it fails that check.

To fix it I think we need to separate the events from child and keep them under the worker pool potentially so they can outlive the child. I think we should unregister them as well. It is a bit tricky to come with some sensible clean up though. Need to think about it a bit more.

@bukka
Copy link
Member

bukka commented Apr 15, 2023

@SandakovMM So I just came up with a slightly different approach which is in #11084 . It is basically delaying freeing of the children that were killed / crashed or descaled. It needs more testing so if you are able to test it, that would be appreciated.

I'm still having a bit of issue to recreate this problem locally. I tried various cases but so far I haven't managed to recreate the race condition. Do you actually know if you see children crashing a lot in the logs and if they produce lots of output to stderr?

@SandakovMM
Copy link
Author

Do you actually know if you see children crashing a lot in the logs and if they produce lots of output to stderr?

As far as I remember the problem was actual for bunch of various applications. So I believe some of them could write into stdout or stderr much, but not all of them. It might be a good way to repeat the race condition anyway in my point of view.

@bukka bukka closed this as completed in 1029537 May 13, 2023
@bukka
Copy link
Member

bukka commented May 14, 2023

I committed the mentioned fix as I think it's hopefully safe. It no longer contains the changes that you reverted as it was not right fix. The change will be part of 8.1.20 and 8.2.7. If you see the issue still happening after using those version, please comment here and I will re-open it.

@SandakovMM
Copy link
Author

Great! Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
@bukka @iluuu1994 @SandakovMM @Girgias and others