-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] PERF Parallelize W/H updates of NMF with OpenMP #16439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Please add something in the UG and maybe also in the docstring to link to https://scikit-learn.org/stable/modules/computing.html#openmp-based-parallelism |
@jeremiedbb: this already looks like a nice improvement! Is there anything left to do here? 🙂 |
I tried to run the benchmark again and found very weird results. After digging it appears that there's a weird interaction between some builds of OpenBLAS and OpenMP, making it 10x slower in some cases. This is not expected and I opened OpenMathLib/OpenBLAS#3187. Since it occurs with the openblas shipped by default by conda-forge I think we should wait a bit before merging this. |
Should we try to update and revisit this PR (without using the call to |
Continuation of #6641
Here's a benchmark of the speedup using 2 threads. When the number of components is large enough the speed-up approaches 2x (maybe a bit better than 2x because I switched to a blas

dot
instead of a manual loop).Since it now uses a BLAS function inside an OpenMP loop, we first need to prevent oversubscription with threadpoolctl before we can merge this pr.