[Messenger] Batch handlers v2 #47090

upyx · 2022-07-28T06:04:08Z

Q	A
Branch?	6.2
Bug fix?	no
New feature?	yes
Deprecations?	possible...
Tickets	nope
License	MIT
Doc PR	TBD

The Messenger component is a good core of asynchronous applications. Since my team utilizes event-driven architecture, we heavily rely on the library. We've been collecting many ideas on improvement over the past years. Most of them are about extending and flexibility, so they require internal changes to the library. Unfortunately, some changes block each other, so it's difficult to do them piece by piece. Nevertheless, I will try. The first small piece is below.

Introdution

The common example of batch handler (merged in #43354) is:

class MyBatchHandler implements BatchHandlerInterface
{
    use BatchHandlerTrait;

    public function __invoke(MyMessage $message, Acknowledger $ack = null)
    {
        return $this->handle($message, $ack);
    }

    private function process(array $jobs): void
    {
        foreach ($jobs as [$message, $ack]) {
            try {
                // [...] compute $result from $message
                $ack->ack($result);
            } catch (\Throwable $e) {
                $ack->nack($e);
            }
        }
    }
}

Where the compute $result from $message is a place for real handler job. The other things are part of batch processing itself, which the handler shouldn't care about.

Proposal

In my vision, the ideal batch handler would look like this:

#[AsMessageHandler(batchStrategy: new BatchCount(10))]
class MyBatchHandler
{
    public function __invoke(MyMessage ...$messages)
    {
        // handle $messages
    }
}

Or an interface alternative to an attribute:

class MyBatchHandler implements BatchStrategyProviderInterface
{
    public function __invoke(MyMessage ...$messages)
    {
        // handle $messages
    }

    public function getBatchStrategy(): BatchStrategyInterface
    {
        return new CountBatchStrategy(10);
    }
}

A bit easier, isn't it? However, this way disables the features which we are used to: separate acknowledging and returning results. There is an option to return them, I'll descire it below.

How batches work

A batch processing consists of parts:

Buffering
Handling the buffer
Acknowledge received messages

The first part (buffering) can be easily done by the library internals. The only thing it needs is trigger where to flush the buffer (run a handler). The second part is the handler's job. And the acknowledging... hm, it's some opinionable part.

Triggering a handler

Generally, rules for triggering a handler are more complex than just message counting. Most likely they would look like: "do not process a batch faster than once per 10 seconds if the buffer is less than 100 messages". Rules can be different, and most likely they would be shared between handlers in a group.

class MyBatchStrategy implements BatchStrategyInterface
{
    private int $bufferSize = 0;
    private int $lastHandledNanoseconds = 0;

    public function shouldHandle(object $lastMessage): bool
    {
        return ($lastMessage->forceHandling)
            || (++$this->bufferSize >= 100)
            || (\hrtime(true) - $this->lastHandledNanoseconds > 1_000_000_000);
    }

    public function beforeHandle(): void
    {
        $this->bufferSize = 0;
    }

    public function afterHandle(): void
    {
        $this->lastHandledNanoseconds = \hrtime(true);
    }
}

Acknowledging

I believe that a batch must be acknowledged or rejected entirely at once. Because it's a batch. If someone needs to acknowledge messages separately, it means that there is no batch but some infrastructure specific optimization or misuse. So, if a handler finishes without an error, the entire batch should be acknowledged.

Returning results

Handlers shouldn't care how they are run synchronously or asynchronously, they should do their job interchangeably and never return a result, but save it or dispatch further. Here is a perfect explanation of the difference between Messenger and Event Dispatcher. It refers to the main architecture idea of the Messenger which makes it so powerful: it never communicates back.

The possibility of returning results through synchronous transport is a historical accident and a misuse of the tool.

Use legacy features

I absolutely sure they are useless. However, it's might be overkill to change all legacy code. As I promised, there is a way to use them:

#[AsMessageHandler(batchStrategy: new BatchCount(10))]
class MyBatchHandler
{
    public function __invoke(Result $r, MyMessage ...$messages)
    {
        $results = $errors = [];
        try {
                // compute $results from $messages
        } catch (\Throwable $e) {
                // save $errors
        }
        foreach ($results as [$message, $result]) $r->ack($message, $result);
        foreach ($errors as [$message, $error]) $r->reject($message, $error);
    }
}

The Result object has a SplObjectStorage that maps messeges with Acknowledger, so it can be shared.

Deprecations

I'd like to deprecate BatchHandlerInterface, Acknowledger and HandledStamp::getResult().

Implementation

There is no proper implementation yet. The PR has a dirty draft to show an idea and play with it. Some tests failed because of a new optional argument in HandlerDescriptor, but it works.

nicolas-grekas · 2022-07-29T11:43:33Z

Hi, thanks for the proposal and for opening the discussion.

I believe that a batch must be acknowledged or rejected entirely at once. Because it's a batch. If someone needs to acknowledge messages separately, it means that there is no batch but some infrastructure specific optimization or misuse. So, if a handler finishes without an error, the entire batch should be acknowledged.

I have a different opinion here: acknowledging each message separately provides flexibility in handling batches. Eg one can process N messages in parallel and ack/nack as needed. This is a desired feature to me. On the contrary, acknowledging the batch itself would give a false impression that batches are transactions. They're not, since we can't guarantee anything from a transactional pov.

I fear that this difference invalidates most of the proposal :(

upyx · 2022-07-29T12:48:55Z

I have a different opinion here

So let's discuss it :)

Eg one can process N messages in parallel and ack/nack as needed. This is a desired feature to me.

Interesting... How do batches help here? When I want parallel processing, I run N workers with "single" handler. Or I can create a custom middleware to process few messages in parallel by one worker with fiber (in PHP 8.2).

On the contrary, acknowledging the batch itself would give a false impression that batches are transactions. They're not, since we can't guarantee anything from a transactional pov.

We can't, it's true. However, a user can! And it's a typical thing that ~~users~~ I do with batch handlers.

Anyway, I've seen batches a lot, and have never (can't remember at least) seen that partially acknowledging helps. Could you share your experience and give some example?

I fear that this difference invalidates most of the proposal :(

Absolutely not :) The Result object in the example is used to acknowledge and set results to messages separately.

nicolas-grekas · 2022-08-01T08:34:41Z

The Result object in the example is used to acknowledge and set results to messages separately

Note that the current design keeps LSP, aka batch handlers remain valid regular handlers. Putting Result as first argument breaks this since batch handlers can't be called as regular handlers anymore. We'd need to preserve this capability.

lyrixx · 2022-08-01T16:44:30Z

I agree with @nicolas-grekas about

acknowledging each message separately provides flexibility in handling batches

Anyway, I've seen batches a lot, and have never (can't remember at least) seen that partially acknowledging helps. Could you share your experience and give some example?

Some exemples

Instead of calling Elasticsearch each time something changes, we buffer everything and we call the bulk API. It much more efficient for ES. Then is a document cannot be inserted, we can reject the message and send it to a dead letter queue for later inspection.
Instead of calling a rate limited API (of translation, but we don't care), we buffer all message, and call it every X secondes, or when the buffer is full. (see https://jolicode.com/blog/batching-symfony-messenger-messages for more information - and an old implementation)

In both situation, we are not in a "transactional mode". So beeing able to ack all messages one by one is a must have feature.

About the API, I prefer a new interface too instead of a trait. But we already discussed that in the PR (or mine, I don't remember). And the trait was finally chosen.

upyx · 2022-08-02T08:54:13Z

The more features, the more code, the harder to keep things clear and flexible. I'm trying to simplify handlers. Especially batch handlers.

@nicolas-grekas

Note that the current design keeps LSP, aka batch handlers remain valid regular handlers. Putting Result as first argument breaks this since batch handlers can't be called as regular handlers anymore. We'd need to preserve this capability.

The capability preserved by HandlerDescriptor and HandlerLocator semantically. There is no superclass to keep LSP by language features.

I'd like to make a single (m.b. internal) interface for all kinds of handlers and few sequential adapters which simplify work for users and keep BC. So the user would make a choice to implement the fullfeatured interface or "just a handler" part. It moves the logic about batches, buffers, acknowledging, and so on from the middleware internals (see how simple it was originally) and incapluletes it in places that are intended to.

Than you for examples @lyrixx !

Well, I see a lot of developers probably want to see that feature...

I won't argue for this.

see https://jolicode.com/blog/batching-symfony-messenger-messages for more information

We are using something close to that right now. As you can see, a batch handler processes a message with the whole batch.

upyx · 2022-08-02T08:58:28Z

Anyway, is this interesting propose or just close it?

upyx · 2022-08-23T02:59:30Z

The answer is clear. Maybe next time...

Batch handler draft

d3c7a2e

carsonbot added Status: Needs Review Feature Messenger RFC RFC = Request For Comments (proposals about features that you want to be discussed) labels Jul 28, 2022

carsonbot added this to the 6.2 milestone Jul 28, 2022

carsonbot changed the title ~~[RFC][Messenger] Batch handlers v2~~ [Messenger] Batch handlers v2 Jul 28, 2022

upyx mentioned this pull request Jul 30, 2022

[Messenger] Be able to start a worker for multiple queues with custom consumption priorities #45882

Open

upyx closed this Aug 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Messenger] Batch handlers v2 #47090

[Messenger] Batch handlers v2 #47090

Uh oh!

upyx commented Jul 28, 2022

Uh oh!

nicolas-grekas commented Jul 29, 2022

Uh oh!

upyx commented Jul 29, 2022

Uh oh!

nicolas-grekas commented Aug 1, 2022 •

edited

Loading

Uh oh!

lyrixx commented Aug 1, 2022

Uh oh!

upyx commented Aug 2, 2022

Uh oh!

upyx commented Aug 2, 2022

Uh oh!

upyx commented Aug 23, 2022

Uh oh!

Uh oh!

Uh oh!

[Messenger] Batch handlers v2 #47090

[Messenger] Batch handlers v2 #47090

Uh oh!

Conversation

upyx commented Jul 28, 2022

Introdution

Proposal

How batches work

Triggering a handler

Acknowledging

Returning results

Use legacy features

Deprecations

Implementation

Uh oh!

nicolas-grekas commented Jul 29, 2022

Uh oh!

upyx commented Jul 29, 2022

Uh oh!

nicolas-grekas commented Aug 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lyrixx commented Aug 1, 2022

Uh oh!

upyx commented Aug 2, 2022

Uh oh!

upyx commented Aug 2, 2022

Uh oh!

upyx commented Aug 23, 2022

Uh oh!

Uh oh!

nicolas-grekas commented Aug 1, 2022 •

edited

Loading