[WIP] PERF Parallelize W/H updates of NMF with OpenMP #16439

jeremiedbb · 2020-02-13T14:47:45Z

Continuation of #6641

Here's a benchmark of the speedup using 2 threads. When the number of components is large enough the speed-up approaches 2x (maybe a bit better than 2x because I switched to a blas dot instead of a manual loop).

Since it now uses a BLAS function inside an OpenMP loop, we first need to prevent oversubscription with threadpoolctl before we can merge this pr.

NicolasHug · 2020-02-13T15:58:37Z

Please add something in the UG and maybe also in the docstring to link to https://scikit-learn.org/stable/modules/computing.html#openmp-based-parallelism

doc/whats_new/v0.23.rst

jjerphan · 2021-04-20T09:48:37Z

@jeremiedbb: this already looks like a nice improvement! Is there anything left to do here? 🙂

jeremiedbb · 2021-04-22T12:23:38Z

I tried to run the benchmark again and found very weird results. After digging it appears that there's a weird interaction between some builds of OpenBLAS and OpenMP, making it 10x slower in some cases. This is not expected and I opened OpenMathLib/OpenBLAS#3187. Since it occurs with the openblas shipped by default by conda-forge I think we should wait a bit before merging this.

jjerphan · 2023-07-23T19:38:57Z

Should we try to update and revisit this PR (without using the call to _dot to prevent OpenMathLib/OpenBLAS#3187 from happening)?

macg0406 and others added 4 commits April 8, 2016 12:13

Parallelization _update_cdnmf_fast

0f261ad

Merge remote-tracking branch 'upstream/master'

6c16589

Merge branch 'master' into cd-nmf-openmp

24197b5

cln

b0737fc

jeremiedbb added Enhancement module:decomposition Performance labels Feb 13, 2020

jeremiedbb added 6 commits February 13, 2020 15:53

what's new

aa22446

lint

a300eba

update what's new

87b9f87

cln

2bcd650

cln unwanated diff

e0411d4

same

f640fd9

Merge remote-tracking branch 'upstream/master' into cd-nmf-openmp

7fa1c5f

TomDLT reviewed Feb 13, 2020

View reviewed changes

doc/whats_new/v0.23.rst Outdated Show resolved Hide resolved

jeremiedbb added 6 commits February 14, 2020 10:47

cln

8e38c03

simplify

fa28b12

cln

0a970d1

less diff with master

693483f

clarify what's new

23a8dbc

note about openmp based parallelism in UG

666f71b

Base automatically changed from master to main January 22, 2021 10:52

thomasjpfan added the cython label Apr 13, 2021

cmarmo mentioned this pull request May 2, 2022

[MRG] Parallelization _update_cdnmf_fast , fast nmf #6641

Closed

jeremiedbb mentioned this pull request Apr 25, 2024

Configure OpenBLAS to use scikit-learn's OpenMP threadpool #28883

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] PERF Parallelize W/H updates of NMF with OpenMP #16439

[WIP] PERF Parallelize W/H updates of NMF with OpenMP #16439

jeremiedbb commented Feb 13, 2020

NicolasHug commented Feb 13, 2020

jjerphan commented Apr 20, 2021

jeremiedbb commented Apr 22, 2021

jjerphan commented Jul 23, 2023

[WIP] PERF Parallelize W/H updates of NMF with OpenMP #16439

Are you sure you want to change the base?

[WIP] PERF Parallelize W/H updates of NMF with OpenMP #16439

Conversation

jeremiedbb commented Feb 13, 2020

NicolasHug commented Feb 13, 2020

jjerphan commented Apr 20, 2021

jeremiedbb commented Apr 22, 2021

jjerphan commented Jul 23, 2023