Skip to content

Make cache stampede protection optional, configurable, and more flexible #27729

Closed
@pounard

Description

@pounard

Following issue #27604 where @nicolas-grekas added the LockRegistry class to prevent cache stampede, I feel the need to re-open the discussion about the current implementation. Here are the details I wan't to raise red flag onto:

Issue #27009 add a new bit of documentation:

added CacheInterface, which provides stampede protection via probabilistic early expiration and should become the preferred way to use a cache

Whereas I do agree with this statement, cache stampede protection is a good thing, it should nevertheless not be hardcoded in current existing backends using the GetTrait. For example, it would be very easy to implement a Redis adapter that does atomic operations or provide locking by its own, case in which the need for this stampede protection algorithm would be less necessary.

Furthermore, #27604 introduces, under the same GetTrait::get() implementation a lock based protection (which means that cache stampede protection uses two different mecanisms, both are great, and using both is legit) but it the developper should be able to choose whether or not each cache is critical or not. Other valid mechanisms can be (as documented in https://en.wikipedia.org/wiki/Cache_stampede and which for most of them I already did use independently depending on the use case):

Wait until the value is recomputed
Return a "not-found" and have the client handle the absence of the value properly
Keep a stale item in the cache to be used while the new value is recomputed

All three are valid choices, it all depend on your business, and I think that Symfony hardcoding any kind of cache stampede implementation is loosing flexibility, and adding overhead in situations it's not necessary. People that do configure the application (may be often developers themselves, but also it could be sysadmins) should be able to choose different strategies for different business components in order to prevent cache stampede.

Right now, none of this is important, because nobody uses the CacheInterface yet - but as soon as people will start using it, problems will start to rise.

My proposal for fixing this are (and we choose all of, none of, some of, or just discuss about):

  • make it configurable at a global level, using two new configuration options: one for toggling lock based protection, and the other for toggling probabilistic early expiration based protection,
  • make it configurable per pool, using the same two variables, but on a cache pool or namespace basis,
  • extract the LockRegistry usage from the hardcoded GetTrait::get() and place it in a decorating implementation instead: LockStampedeProtectedCacheProxy for example,
  • extract the probabilistic early expiration too, and the same way implement it via a proxy, good thing about that would be that it could be easily applied to other cache implementations too (not only the CacheInterface).

Problem is that, as soon as everyone will be using CacheInterface, everyone will implicitely use both of the locked based protection (which is prone to errors or just be useless depending on how is setup the file system under) and the probabilistic early expiration (which may add unecessary overhead for many, many contextes).

Metadata

Metadata

Assignees

No one assigned

    Labels

    CacheRFCRFC = Request For Comments (proposals about features that you want to be discussed)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions