Skip to content

[Semaphore] Added the component #35780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 27, 2020
Merged

[Semaphore] Added the component #35780

merged 1 commit into from
Aug 27, 2020

Conversation

lyrixx
Copy link
Member

@lyrixx lyrixx commented Feb 18, 2020

Q A
Branch? yes
Bug fix? no
New feature? yes
Deprecations? no
Tickets
License MIT
Doc PR

[Semaphore] Added the component

Few years ago, we have introduced the Lock component. This is a very nice component, but sometime it is not enough. Sometime you need semaphore.

This is why I'm introducing this new component.

What is a Semaphore ?

From wikipedia:

In computer science, a semaphore is a variable or abstract data type used to control access to a common resource by multiple processes in a concurrent system such as a multitasking operating system. A semaphore is simply a variable. This variable is used to solve critical section problems and to achieve process synchronization in the multi processing environment. A trivial semaphore is a plain variable that is changed (for example, incremented or decremented, or toggled) depending on programmer-defined conditions.

This new component is more than a variable. This is an abstraction on top of different storage.

To make a quick comparison with a lock:

  • A lock allows only 1 process to access a resource;
  • A semaphore allow N process to access a resource.

Basically, a lock is a semaphore where N = 1.

Possible confusion

PHP exposes some sem_* functions like sem_acquire. This module provides wrappers for the System V IPC family of functions. It includes semaphores, shared memory and inter-process messaging (IPC).

The Lock component has a storage that works with theses functions. It uses it with N = 1.

What are the use-cases ?

Wikipedia has some examples

But I can add one more commun use case.

If you are building an async system that process user data, you may want to priorise all jobs. You can achieve that by running at maximum N jobs per user at the same time. If the user has more resources, you give him more concurrent jobs (so a bigger N).

Thanks to semaphores, it's pretty easy to know if a new job can be run.

Some concrete use-cases

I'm not saying the following services are using semaphore, but they may solve the previous problematic with semaphores. Here is some examples:

  • services like testing platform where a user can test N projects concurrently (travis, circle, appveyor, insight, ...)
  • services that ingest lots of data (newrelic, datadog, blackfire, segment.io, ...))
  • services that send email in batch (campaign monitor, mailchimp, ...)
  • etc...

How to use it ?

To do so, since PHP is mono-threaded, you run M PHP workers. And in each worker, you look for for the next job. When you grab a job, you try to acquires a semaphore. If you got it, you process the job. If not you try another job.

FTR in other language, like Go, there are no need to run M workers, one is enough.

With Symfony

<?php

use Symfony\Component\Lock\LockFactory;
use Symfony\Component\Lock\Store\RedisStore as LockRedisStore;
use Symfony\Component\Semaphore\SemaphoreFactory;
use Symfony\Component\Semaphore\Store\RedisStore;

require __DIR__.'/vendor/autoload.php';

$redis = new Redis();
$redis->connect('172.17.0.2');

// Internally, Semaphore needs a lock
$lock = (new LockFactory(new LockRedisStore($redis)))->createLock('test:lock', 1);

// Create a semaphore:
// * name = test
// * limit = 3 (it means only 3 process are allowed)
// * ttl = 10 seconds : Maximum expected semaphore duration in seconds
$semaphore = (new SemaphoreFactory($lock, new RedisStore($redis)))->createSemaphore('test', 3, 10);

if (!$semaphore->acquire()) {
    echo "Could not acquire the semaphore\n";
    exit(1);
}

// The semaphore has been acquired

// Do the heavy job
for ($i = 0; $i < 100; ++$i) {
    sleep(1);
    // Before the expiration, refresh the semaphore if the job is not finished yet
    if ($i % 9 === 0) {
        $semaphore->refresh();
    }
}

// Release it when finished
$semaphore->release();

Prior art

I looked at packagist and:

  • most of packages are using a semaphore storage for creating a lock. So there are not relevant here;
  • some packages need an async framework to be used (amphp for example);
  • the only packages really implementing a semaphore, has a really low code quality and some bugs.

Current implementation

  1. I initially copied the Lock component since the external API is quite similar;
  2. I simplified it a lot for the current use case;
  3. I implemented the RedisStorage according the redis book
  4. I forced a TTL on the storage.

TODO:

  • documentation
  • test
  • move the lock requirements to the redis storage only ? Not needed anymore

Copy link
Contributor

@OskarStark OskarStark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍🏻 I couldn’t say much about the code itself but it should be experimental in the beginning 🙂

@GromNaN
Copy link
Member

GromNaN commented Feb 18, 2020

Very useful feature for distributed systems when you need to limit concurrency. We used RabbitMQ to store semaphores.

@fbourigault
Copy link
Contributor

fbourigault commented Feb 19, 2020

This is exactly what I need limit outgoing http connections for a given client.

@nicolas-grekas nicolas-grekas added this to the next milestone Feb 20, 2020
@jderusse
Copy link
Member

Great idea! I like it.

Wouldn't it make sens to merge it with Lock Component?

I've also an enhancement on my todo list, about adding Read/WriteLock (wellknown as SharedLock for flock) to the Lock Compoenent (basicaly being able to get N (infinity) locks for reads, but 1 lock for Write). And sounds like something realy close to this new SemaphoreComponent

Copy link
Member

@jderusse jderusse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. With an effective use of Redis' structs.

What's about High Availability?

@lyrixx
Copy link
Member Author

lyrixx commented Mar 3, 2020

Wouldn't it make sens to merge it with Lock Component?

I don't think so. Even if theses 2 components look similar, they are not. They do not share code (except the Key and StoreFactory Classes, but they are different).

Semaphore does not require lock anymore.

Again, Theses concept are similar, but they are not the same. That's why they have two different name.

PHP, Linux, Go, Rust have different packages for theses 2 purposes. I think we should stay consistant with others big players.

More over it will be:

  • better for discoverability
  • easier to maintain (I guess)
  • easier to get more accurate stats utilisation

What's about High Availability?

I don't see the issue here. Could you be more specific?

stof
stof previously requested changes Mar 3, 2020
Copy link
Member

@stof stof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The replace rule is missing in the root composer.json file.

@bwoebi
Copy link

bwoebi commented Mar 5, 2020

@lyrixx amphp/sync is relying on amp promises. It does not need the loop though...

@lyrixx lyrixx force-pushed the semaphore branch 2 times, most recently from 14e6f37 to 68e21aa Compare March 19, 2020 17:29
@lyrixx
Copy link
Member Author

lyrixx commented Mar 19, 2020

I changed a bit the implementation. Now it's not as in the Book. But since I'm using a lua script, I'm sure there is no race condition possible (I tested it).
More over, it's not possible to get an "unfair" semaphore.

The system is now simpler and more powerful.

@lyrixx lyrixx requested a review from jderusse March 19, 2020 21:53
Copy link
Member

@jderusse jderusse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Just a question: In Lock Component we provided a "CombinedStore" in order to provide HighAvailability and guarentee reliability even when a Redis Server in unreachable. Doe you think it worth it?

@lyrixx
Copy link
Member Author

lyrixx commented Mar 19, 2020

Just a question: In Lock Component we provided a "CombinedStore" in order to provide HighAvailability and guarentee reliability even when a Redis Server in unreachable. Doe you think it worth it?

I don't know. ATM we only have Redis so I don't think this is useful. Maybe later ?

@fabpot
Copy link
Member

fabpot commented Aug 11, 2020

@lyrixx @jderusse What's the status here?

@lyrixx
Copy link
Member Author

lyrixx commented Aug 11, 2020

To me it was ready month ago

fabpot
fabpot previously requested changes Aug 11, 2020
Copy link
Member

@fabpot fabpot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the code in the PR description is not up to date anymore, can you update it and provide some code that matches with the current implementation?

Also, as the Redis store had some changes recently, would it make sense to backport some of them here?

@fabpot
Copy link
Member

fabpot commented Aug 18, 2020

@lyrixx Can you have a look at the feedback?

@lyrixx lyrixx force-pushed the semaphore branch 2 times, most recently from 6575696 to 6b7d0f4 Compare August 27, 2020 10:13
@lyrixx
Copy link
Member Author

lyrixx commented Aug 27, 2020

@jderusse @fabpot I have addressed your comments. I hope it's Okay now :)

@lyrixx lyrixx requested a review from fabpot August 27, 2020 10:21
@lyrixx lyrixx requested a review from jderusse August 27, 2020 10:21
@lyrixx lyrixx dismissed stale reviews from stof and fabpot August 27, 2020 10:22

Code has been updated

@lyrixx
Copy link
Member Author

lyrixx commented Aug 27, 2020

Note: The CI is broken, I'm trying to fix it.

Copy link
Member

@jderusse jderusse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to me, plus few minor comments

Few years ago, we have introduced the Lock component. This is a very nice component, but sometime it is not enough. Sometime you need semaphore.

This is why I'm introducing this new component.

From wikipedia:

> In computer science, a semaphore is a variable or abstract data type used to control access to a common resource by multiple processes in a concurrent system such as a multitasking operating system. A semaphore is simply a variable. This variable is used to solve critical section problems and to achieve process synchronization in the multi processing environment. A trivial semaphore is a plain variable that is changed (for example, incremented or decremented, or toggled) depending on programmer-defined conditions.

This new component is more than a variable. This is an abstraction on top of different storage.

To make a quick comparison with a lock:

 * A lock allows only 1 process to access a resource;
 * A semaphore allow N process to access a resource.

Basically, a lock is a semaphore where `N = 1`.

PHP exposes some `sem_*` functions like [`sem_acquire`](http://php.net/sem_acquire). This module provides wrappers for the System V IPC family of functions. It includes semaphores, shared memory and inter-process messaging (IPC).

The Lock component has a storage that works with theses functions. It uses it with `N = 1`.

Wikipedia has some [examples](https://en.wikipedia.org/wiki/Semaphore_(programming)#Examples)

But I can add one more commun use case.

If you are building an async system that process user data, you may want to priorise all jobs. You can achieve that by running at maximum N jobs per user at the same time. If the user has more resources, you give him more concurrent jobs (so a bigger `N`).

Thanks to semaphores, it's pretty easy to know if a new job can be run.

I'm not saying the following services are using semaphore, but they may solve the previous problematic with semaphores. Here is some examples:

 * services like testing platform where a user can test N projects concurrently (travis, circle, appveyor, insight, ...)
 * services that ingest lots of data (newrelic, datadog, blackfire, segment.io, ...))
 * services that send email in batch (campaign monitor, mailchimp, ...)
 * etc...

To do so, since PHP is mono-threaded, you run M PHP workers. And in each worker, you look for for the next job. When you grab a job, you try to acquires a semaphore. If you got it, you process the job. If not you try another job.

FTR in other language, like Go, there are no need to run M workers, one is enough.

```php
<?php

use Symfony\Component\Lock\LockFactory;
use Symfony\Component\Lock\Store\RedisStore as LockRedisStore;
use Symfony\Component\Semaphore\SemaphoreFactory;
use Symfony\Component\Semaphore\Store\RedisStore;

require __DIR__.'/vendor/autoload.php';

$redis = new Redis();
$redis->connect('172.17.0.2');

// Internally, Semaphore needs a lock
$lock = (new LockFactory(new LockRedisStore($redis)))->createLock('test:lock', 1);

// Create a semaphore:
// * name = test
// * limit = 3 (it means only 3 process are allowed)
// * ttl = 10 seconds : Maximum expected semaphore duration in seconds
$semaphore = (new SemaphoreFactory($lock, new RedisStore($redis)))->createSemaphore('test', 3, 10);

if (!$semaphore->acquire()) {
    echo "Could not acquire the semaphore\n";
    exit(1);
}

// The semaphore has been acquired

// Do the heavy job
for ($i = 0; $i < 100; ++$i) {
    sleep(1);
    // Before the expiration, refresh the semaphore if the job is not finished yet
    if ($i % 9 === 0) {
        $semaphore->refresh();
    }
}

// Release it when finished
$semaphore->release();
```

I looked at [packagist](https://packagist.org/?query=semaphore) and:

 * most of packages are using a semaphore storage for creating a lock. So there are not relevant here;
 * some packages need an async framework to be used (amphp for example);
 * the only packages really implementing a semaphore, has a really low code quality and some bugs.

1. I initially copied the Lock component since the external API is quite similar;
1. I simplified it a lot for the current use case;
1. I implemented the RedisStorage according the [redis book](https://redislabs.com/ebook/part-2-core-concepts/chapter-6-application-components-in-redis/6-3-counting-semaphores/;)
1. I forced a TTL on the storage.
@lyrixx
Copy link
Member Author

lyrixx commented Aug 27, 2020

Semaphore component is now 💚 in the CI

@fabpot
Copy link
Member

fabpot commented Aug 27, 2020

Thank you @lyrixx.

@fabpot fabpot merged commit ce8b497 into symfony:master Aug 27, 2020
@lyrixx lyrixx deleted the semaphore branch August 27, 2020 14:45
@nicolas-grekas nicolas-grekas modified the milestones: next, 5.2 Oct 5, 2020
@fabpot fabpot mentioned this pull request Oct 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.