-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[HttpCache] occasionnal 503 error on locking request #15813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have seen this as well, now watching. |
It also happens to me. Usually in development when I make a change in the code, clearing the cache fixes the issue. |
After a blast of 503 I saw some lock files weren't deleted. No idea why for the moment. |
This shouldn't be the case. We use |
I'm pretty sure the problem is caused by |
Maybe the lock files are created and then the process holding the lock crashes/terminates for some reason without removing the lock file? Do you see this for a particular URL only or for different ones? |
I've just had one of these occurences and my logs show that at the beginning of the event, the kernel oom-killer terminated an Apache child process (I'm using the prefork MPM with the PHP Apache module). That would support the hypothesis of stale .lck files lying around after sudden termination of PHP processes. I'd suggest switching over to |
The fstat cache seems to be an issue as well, see #16312 |
All heavily accessed URL using cache.
I'm praying ^^ |
Even with #16312 we have the possibility that a forcefully terminated process leaves a stale .lck file lying around. |
I only saw this once so it's not my main concern with this issue. |
…k file has been released (mpdude) This PR was squashed before being merged into the 2.3 branch (closes #16312). Discussion ---------- [HttpKernel] clearstatcache() so the Cache sees when a .lck file has been released | Q | A | ------------- | --- | Bug fix? | yes | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | #15813 | License | MIT | Doc PR | n/a I've been trying to debug #15813 and modified the Store in a way to keep unique request IDs in the .lck file. That way, I was hoping to find out which request is blocking and/or if the request is actually still running. It turned out that `is_file()` would claim that a lock file still exists, but a subsequent attempt to read the information from that file returned "file not found" errors. So, my assumption is that the `is_file()` result is based on the fstat cache and wrong once a process has seen the lock file. @jakzal said in #15813 (comment) that `unlink()`ing the lock file should clear the statcache, but I doubt this is true across PHP processes. Commits ------- 982710f [HttpKernel] clearstatcache() so the Cache sees when a .lck file has been released
…k file has been released (mpdude) This PR was squashed before being merged into the 2.3 branch (closes #16312). Discussion ---------- [HttpKernel] clearstatcache() so the Cache sees when a .lck file has been released | Q | A | ------------- | --- | Bug fix? | yes | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | #15813 | License | MIT | Doc PR | n/a I've been trying to debug #15813 and modified the Store in a way to keep unique request IDs in the .lck file. That way, I was hoping to find out which request is blocking and/or if the request is actually still running. It turned out that `is_file()` would claim that a lock file still exists, but a subsequent attempt to read the information from that file returned "file not found" errors. So, my assumption is that the `is_file()` result is based on the fstat cache and wrong once a process has seen the lock file. @jakzal said in symfony/symfony#15813 (comment) that `unlink()`ing the lock file should clear the statcache, but I doubt this is true across PHP processes. Commits ------- 982710f [HttpKernel] clearstatcache() so the Cache sees when a .lck file has been released
…ased after two second wait (jakzal) This PR was merged into the 2.3 branch. Discussion ---------- [HttpKernel] Lookup the response even if the lock was released after two second wait | Q | A | ------------- | --- | Bug fix? | yes | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | - | License | MIT | Doc PR | - While looking into #15813 I noticed that we [wait for the lock to be released for five seconds, but then only do a lookup if the lock was released in two seconds](https://github.com/symfony/symfony/blob/fa604d3c6f16f264863a42c200391ab996640296/src/Symfony/Component/HttpKernel/HttpCache/HttpCache.php#L540-L562), no more. I think it's worth to make both values the same (so either two or five seconds). I see no reason why we should wait for the lock for five seconds, but then only do a lookup if we waited for two. One way the wait either takes too long, the other way we loose the opportunity to actually return a response. Commits ------- 9963170 [HttpKernel] Lookup the response even if the lock was released after 2 seconds
This PR was merged into the 2.7 branch. Discussion ---------- [HttpKernel] Use flock() for HttpCache's lock files | Q | A | ------------- | --- | Branch? | 2.7 | Bug fix? | yes | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | no | Fixed tickets | #16777, #15813 and #16312 are also related | License | MIT | Doc PR | When a PHP process crashes or terminates (maybe the OOM killer kicks in or other bad things ™️ happen) while the `HttpCache` holds a `.lck` file, that lock file may not get `unlink()`ed. The result is that other requests trying to access this cache entry will see a few seconds delay while waiting for the lock; they will eventually continue but send 503 status codes along with the response. The sudden buildup of PHP processes caused by the additional delay may cause further problems (sudden load increase). As `LockHandler` is using `flock()`-based locking, locks should be released by the OS when the PHP process terminates. I wrote this as bugfix against 2.7 because every once in a while I encounter situations (not always reproducible) where `.lock` files are left over and keep the cache locked. Commits ------- 2668edd [HttpKernel] Use flock() for HttpCache's lock files
I have this problem for a long time now: requests using HttpCache sometimes get caught in the waiting loop. After a bit of monitoring I'm sure the request creating the lock doesn't take more than 2s so parallel requests shouldn't return a 503 code.
I couldn't reproduce the problem on my computer, maybe because it's fairly rare (say 0.01% of requests). Couldn't this be due to a missing
clearstatcache
in theisLocked
function?The text was updated successfully, but these errors were encountered: