Skip to content

A plan to safely support OpenMP in our Cython code base #7650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ogrisel opened this issue Oct 12, 2016 · 2 comments
Closed

A plan to safely support OpenMP in our Cython code base #7650

ogrisel opened this issue Oct 12, 2016 · 2 comments

Comments

@ogrisel
Copy link
Member

ogrisel commented Oct 12, 2016

This is an informational roadmap issue.

At the moment it is not safe to use OpenMP for low overhead thread based parallelism in our Cython code base because of a bad interaction between multiprocessing.Pool (worker processes are necessarily created by fork without exec under Python 2) and the openmp runtime library libgomp used in GCC. This can cause the program to silently freeze.

A workaround for Python 3.4 and later is documented here: https://pythonhosted.org/joblib/parallel.html#bad-interaction-of-multiprocessing-and-third-party-libraries but it has some side effects and does not work for Python 2.7.

To mitigate this issue we (@tomMoral and I) are currently experimenting with a promising new process pool management system: https://github.com/tomMoral/loky

It uses low level multiprocessing primitives (queues based on pipes locked via semaphores for interprocess communication) and some code from the concurrent futures module. The API is compatible with Python 3's ProcessPoolExecutor class but:

  • we also support Python 2.7 (by maintaining a backport of missing Python code and using ctypes to manage semaphores by calling into libpthread) without any compiled extension,
  • the process are spawned (fork with exec) therefore we don't break the OpenMP runtime,
  • contrary to multiprocessing.Pool and the default Python 3 ProcessPoolExecutor class we can robustly detect whenever a worker process or an internal management thread has terminated (e.g. segfault, user issued kill -9, Operating System Out of Memory killer, faulty pickling in the payload) and issue a specific exception and destroying the remaining workers deterministically instead of freezing silently,
  • an existing pool instance can be resized (to add or remove worker processes) incrementally.

Note that the robustification of ProcessPoolExecutor is planned to be contributed upstream (e.g. for Python 3.7).

Once this work is complete (code cleanup, simplification, refactoring, documentation + more tests), we plan to make it the default backend for joblib (after benchmarking it) and then synchronize the embedded joblib in sklearn to benefit from this.

At this point we will be able to use Cython prange and other OpenMP backed constructs safely in scikit-learn, for instance as suggested in #6641.

@ogrisel ogrisel changed the title A plan to support OpenMP in our code base A plan to safely support OpenMP in our Cython code base Oct 12, 2016
@amueller
Copy link
Member

wow you rock :)

@rth
Copy link
Member

rth commented Jun 11, 2019

Closing as thanks to loky OpenMP can now be used in Cython :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants