Skip to content

[HttpCache] occasionnal 503 error on locking request #15813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MatTheCat opened this issue Sep 16, 2015 · 11 comments
Closed

[HttpCache] occasionnal 503 error on locking request #15813

MatTheCat opened this issue Sep 16, 2015 · 11 comments

Comments

@MatTheCat
Copy link
Contributor

I have this problem for a long time now: requests using HttpCache sometimes get caught in the waiting loop. After a bit of monitoring I'm sure the request creating the lock doesn't take more than 2s so parallel requests shouldn't return a 503 code.

I couldn't reproduce the problem on my computer, maybe because it's fairly rare (say 0.01% of requests). Couldn't this be due to a missing clearstatcache in the isLocked function?

@mpdude
Copy link
Contributor

mpdude commented Sep 17, 2015

I have seen this as well, now watching.

@gnugat
Copy link
Contributor

gnugat commented Sep 20, 2015

It also happens to me. Usually in development when I make a change in the code, clearing the cache fixes the issue.
Somehow I can't reproduce it :(

@MatTheCat
Copy link
Contributor Author

After a blast of 503 I saw some lock files weren't deleted. No idea why for the moment.

@jakzal
Copy link
Contributor

jakzal commented Oct 18, 2015

Couldn't this be due to a missing clearstatcache in the isLocked function?

This shouldn't be the case. We use unlink() to remove the lock file and it clears the stat cache automatically.

@MatTheCat
Copy link
Contributor Author

I'm pretty sure the problem is caused by isLocked returning true when it shouldn't. Do you have any idea of how can this happen?

@mpdude
Copy link
Contributor

mpdude commented Oct 22, 2015

Maybe the lock files are created and then the process holding the lock crashes/terminates for some reason without removing the lock file?

Do you see this for a particular URL only or for different ones?

@mpdude
Copy link
Contributor

mpdude commented Oct 22, 2015

I've just had one of these occurences and my logs show that at the beginning of the event, the kernel oom-killer terminated an Apache child process (I'm using the prefork MPM with the PHP Apache module).

That would support the hypothesis of stale .lck files lying around after sudden termination of PHP processes.

I'd suggest switching over to flock()-based locking as in the https://github.com/symfony/symfony/blob/2.8/src/Symfony/Component/Filesystem/LockHandler.php

@mpdude
Copy link
Contributor

mpdude commented Oct 22, 2015

The fstat cache seems to be an issue as well, see #16312

@MatTheCat
Copy link
Contributor Author

Do you see this for a particular URL only or for different ones?

All heavily accessed URL using cache.

see #16312

I'm praying ^^

@mpdude
Copy link
Contributor

mpdude commented Oct 22, 2015

Even with #16312 we have the possibility that a forcefully terminated process leaves a stale .lck file lying around.

@MatTheCat
Copy link
Contributor Author

I only saw this once so it's not my main concern with this issue.

fabpot added a commit that referenced this issue Nov 28, 2015
…k file has been released (mpdude)

This PR was squashed before being merged into the 2.3 branch (closes #16312).

Discussion
----------

[HttpKernel] clearstatcache() so the Cache sees when a .lck file has been released

| Q             | A
| ------------- | ---
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #15813
| License       | MIT
| Doc PR        | n/a

I've been trying to debug #15813 and modified the Store in a way to keep unique request IDs in the .lck file. That way, I was hoping to find out which request is blocking and/or if the request is actually still running.

It turned out that `is_file()` would claim that a lock file still exists, but a subsequent attempt to read the information from that file returned "file not found" errors.

So, my assumption is that the `is_file()` result is based on the fstat cache and wrong once a process has seen the lock file.

@jakzal said in #15813 (comment) that `unlink()`ing the lock file should clear the statcache, but I doubt this is true across PHP processes.

Commits
-------

982710f [HttpKernel] clearstatcache() so the Cache sees when a .lck file has been released
@fabpot fabpot closed this as completed Nov 28, 2015
fabpot added a commit to symfony/http-kernel that referenced this issue Nov 28, 2015
…k file has been released (mpdude)

This PR was squashed before being merged into the 2.3 branch (closes #16312).

Discussion
----------

[HttpKernel] clearstatcache() so the Cache sees when a .lck file has been released

| Q             | A
| ------------- | ---
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #15813
| License       | MIT
| Doc PR        | n/a

I've been trying to debug #15813 and modified the Store in a way to keep unique request IDs in the .lck file. That way, I was hoping to find out which request is blocking and/or if the request is actually still running.

It turned out that `is_file()` would claim that a lock file still exists, but a subsequent attempt to read the information from that file returned "file not found" errors.

So, my assumption is that the `is_file()` result is based on the fstat cache and wrong once a process has seen the lock file.

@jakzal said in symfony/symfony#15813 (comment) that `unlink()`ing the lock file should clear the statcache, but I doubt this is true across PHP processes.

Commits
-------

982710f [HttpKernel] clearstatcache() so the Cache sees when a .lck file has been released
fabpot added a commit that referenced this issue Jan 25, 2016
…ased after two second wait (jakzal)

This PR was merged into the 2.3 branch.

Discussion
----------

[HttpKernel] Lookup the response even if the lock was released after two second wait

| Q             | A
| ------------- | ---
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | -
| License       | MIT
| Doc PR        | -

While looking into #15813 I noticed that we [wait for the lock to be released for five seconds, but then only do a lookup if the lock was released in two seconds](https://github.com/symfony/symfony/blob/fa604d3c6f16f264863a42c200391ab996640296/src/Symfony/Component/HttpKernel/HttpCache/HttpCache.php#L540-L562), no more.

I think it's worth to make both values the same (so either two or five seconds). I see no reason why we should wait for the lock for five seconds, but then only do a lookup if we waited for two. One way the wait either takes too long, the other way we loose the opportunity to actually return a response.

Commits
-------

9963170 [HttpKernel] Lookup the response even if the lock was released after 2 seconds
nicolas-grekas added a commit that referenced this issue Jul 28, 2016
This PR was merged into the 2.7 branch.

Discussion
----------

[HttpKernel] Use flock() for HttpCache's lock files

| Q             | A
| ------------- | ---
| Branch?       | 2.7
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | no
| Fixed tickets | #16777, #15813 and #16312 are also related
| License       | MIT
| Doc PR        |

When a PHP process crashes or terminates (maybe the OOM killer kicks in or other bad things ™️ happen) while the `HttpCache` holds a `.lck` file, that lock file may not get `unlink()`ed.

The result is that other requests trying to access this cache entry will see a few seconds delay while waiting for the lock; they will eventually continue but send 503 status codes along with the response. The sudden buildup of PHP processes caused by the additional delay may cause further problems (sudden load increase).

As `LockHandler` is using `flock()`-based locking, locks should be released by the OS when the PHP process terminates.

I wrote this as bugfix against 2.7 because every once in a while I encounter situations (not always reproducible) where `.lock` files are left over and keep the cache locked.

Commits
-------

2668edd [HttpKernel] Use flock() for HttpCache's lock files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants