-
Notifications
You must be signed in to change notification settings - Fork 53
Use case: add distributed and GPU support to SciPy #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Certainly parallel algorithms are achievable today, and can grow over time. Every modern CPU has a multicore CPU today, thus everyone can benefit from this improvement. |
Thanks @tdimitri. Yes, I know parallel algorithms are feasible, and SciPy has some (mostly trivial parallelism though). There's issues though that need careful thought - around maintainability, portability, nested parallelism, distribution of binaries when you don't control the whole stack, and more. I intend to do a thorough write-up in a separate issue. |
I look forward to the write up as I believe together we can address most issues. We need a few OS abstractions: thread creation/destruction, wakeup, a few lockfree semantics such as interlockedadd. These are known for all OS and thus do not require linking to boost, or other OS abstraction packages. We can just support a few OS for this first pass. This does give us full control of the stack and portability for the first version. Again, threading can always be off or default to off with no side effects or harm to existing code. Then we can parallelize the easiest routines first, such as unary functions. We do not have to parallelize everything - that will take time. What is important is that we begin instead of waiting for a perfect solution. It is also worthwhile to vectorize while threading. |
Added in the "Use cases" section in master, closing. |
When surveying a representative set of advanced users and research software engineers in 2019 (for this NSF proposal), the single most common pain point brought up about SciPy was performance.
SciPy heavily relies on NumPy (its only non-optional runtime dependency). NumPy provides an array implementation that's in-memory, CPU-only and single-threaded. Common things users ask for are:
Some parallelism, in particular via
multiprocessing
, can be supported, however SciPy itself will not directly start depending on a GPU or distributed array implementation, or contain (e.g.) CUDA code - that's not maintainable given the resources for development. However, there is a way to provide distributed or GPU support. Part of the solution is provided by NumPy's "array protocols" (see gh-1), that allow dispatching to other array implementations. The main problem then becomes how to know whether this will work with a particular distributed or GPU array implementation - given that there are zero other array implementations that are even close to providing full NumPy compatibility - without adding it as a hard dependency.It's clear that SciPy functionality that relies on compiled extensions (C, C++, Cython, Fortran) directly won't work. Pure Python code can work though. There's two main possibilities:
Option (2) seems strongly preferable, and that "well-defined subset" is what an API standard should provide. Testing will still be needed, to ensure there are no critical corner cases or bugs between array implementations, however that's then a very tractable task.
The text was updated successfully, but these errors were encountered: