[MRG] Parallelization _update_cdnmf_fast , fast nmf #6641

macg0406 · 2016-04-08T04:28:52Z

Parallelized the function "_update_cdnmf_fast" by using prange/openmp, so that the process of nmf will be faster. The environment variable OMP_NUM_THREADS can be used to the max thread used.

dohmatob · 2016-04-08T05:39:12Z

Any benchmarks demonstrating the gains ?

GaelVaroquaux · 2016-04-08T06:07:29Z

sklearn/decomposition/cdnmf_fast.pyx

                # gradient = GW[t, i] where GW = np.dot(W, HHt) - XHt
-                grad = -XHt[i, t]
+                # grad = -XHt[i, t]


You should remove the commented lines.

GaelVaroquaux · 2016-04-08T06:11:49Z

The standard way that we support parallel computing in scikit-learn is via joblib (here you would use the "threading" backend. It is robust to various compilers (not every compiler implements openmp), and gives an explicit way of controling the number of CPU used. Any reason that you didn't do it in this PR? We would need robustness to compiler supper and an explicit control the the number of CPU used to include this in scikit-learn. Also, you have failing tests on some architectures.

mblondel · 2016-04-15T01:35:44Z

My intuition is that joblib isn't appropriate for parallelizing tight loops. In any case, I would also be interested in speed up figures and if @macg0406 has time an openmp vs. joblib comparison.

ogrisel · 2016-10-05T09:30:53Z

The main problem preventing us to use openmp in the inner loops of scikit-learn is the bad interaction (silent freeze) with multiprocessing used by joblib for instance in a GridSearchCV wrapper. More details here:

https://pythonhosted.org/joblib/parallel.html#bad-interaction-of-multiprocessing-and-third-party-libraries

This will be solved by a new process pool manager were are currently working on at https://github.com/tomMoral/loky and that is planned to be used to replace the default multiprocessing backend of joblib. This is not ready yet though.

amueller · 2016-10-05T19:10:10Z

@ogrisel can you maybe create an issue on that to track it? I think we definitely need to put some work into getting OpenMP to work, but I have not really any insight into what's happening right now.

mblondel · 2016-10-06T01:31:00Z

+1! Very interested in this too.

ogrisel · 2016-10-12T11:34:19Z

Done in #7650.

rth · 2019-06-14T12:41:35Z

Any chance you could merge master here @macg0406 ?

OpenMP is now supported, this PR from 2016 becomes possible. cc @jeremiedbb

jeremiedbb

I made some comments. Also, all the commented lines need to be removed.

~~But after a quick profiling it appears that it's not the critical part of the algo at all (see below). Although it doesn't hurt the performances to use a prange here, the gain is very very small.~~
Edit: I only had 2 components. With more components it becomes the critical part

jeremiedbb · 2019-06-17T09:03:08Z

sklearn/decomposition/cdnmf_fast.pyx

+    cdef double grad = 0 - xht
+    cdef double pg
+    for r in range(n_components):
+    # for(int r =0;r<n_components;r++)


missing cdef Py_ssize_t r

jeremiedbb · 2019-06-17T09:04:05Z

sklearn/decomposition/cdnmf_fast.pyx

+from cython.parallel import prange
+
+
+cdef inline double _update_cdnmf_samples(unsigned n_components,double xht, double [] HHt,double[] W,double hess,unsigned t) nogil:


following previous declarations, n_components and r and Py_ssize_t

jeremiedbb · 2019-06-17T09:04:54Z

sklearn/decomposition/cdnmf_fast.pyx



 def _update_cdnmf_fast(double[:, ::1] W, double[:, :] HHt, double[:, :] XHt,
                       Py_ssize_t[::1] permutation):
    cdef double violation = 0
    cdef Py_ssize_t n_components = W.shape[1]
    cdef Py_ssize_t n_samples = W.shape[0]  # n_features for H update
-    cdef double grad, pg, hess
+    cdef double pg, hess


pg is not used any more

jeremiedbb · 2019-06-17T09:05:44Z

sklearn/decomposition/cdnmf_fast.pyx

+    if hess != 0:
+        W[t] -= grad/hess
+        if W[t] < 0 :
+            W[t] = 0


W[t] = fmax(W[t], 0) is probably a bit faster

jeremiedbb · 2019-06-17T09:07:47Z

sklearn/decomposition/cdnmf_fast.pyx

    with nogil:
        for s in range(n_components):
            t = permutation[s]
+            # Hessian
+            hess = HHt[t, t]


no need to define hess here. It can be done directly in _update_cdnmf_samples

jeremiedbb · 2019-06-17T09:09:15Z

sklearn/decomposition/cdnmf_fast.pyx

+from cython.parallel import prange
+
+
+cdef inline double _update_cdnmf_samples(unsigned n_components,double xht, double [] HHt,double[] W,double hess,unsigned t) nogil:


I think double* W and double* HHt would be more clear.

jeremiedbb · 2019-06-17T09:10:21Z

sklearn/decomposition/cdnmf_fast.pyx



 def _update_cdnmf_fast(double[:, ::1] W, double[:, :] HHt, double[:, :] XHt,
                       Py_ssize_t[::1] permutation):
    cdef double violation = 0
    cdef Py_ssize_t n_components = W.shape[1]
    cdef Py_ssize_t n_samples = W.shape[0]  # n_features for H update
-    cdef double grad, pg, hess
+    cdef double pg, hess
    cdef Py_ssize_t i, r, s, t


r is unused here now

jeremiedbb · 2019-06-17T11:07:02Z

Here is a quick benchmark with 1000 samples and 10000 features, on my laptop (2 physical cores).

jeremiedbb · 2019-06-20T12:16:38Z

Should we introduce a n_jobs parameter to NMF for the prange ?
@ogrisel @rth @GaelVaroquaux

GaelVaroquaux · 2019-06-20T12:24:53Z

Should we introduce a n_jobs parameter to NMF for the prange ?

Gut feeling: +1

TomDLT · 2019-06-20T17:05:00Z

+1 for n_jobs.

jeremiedbb · 2019-10-21T09:03:55Z

@macg0406 Finally it's been decided to not expose a n_jobs parameter for that. However, the number of threads for the prange needs to be given by a helper _openmp_effective_n_threads in sklearn.utils._openmp_helpers. It should be used as follow:

cdef int num_threads = _openmp_effective_n_threads()
for i in prange(nm_threads=num_threads):
    ...

Are you still willing to work on this ? If you don't have time I can take over.

cmarmo · 2022-05-02T19:02:08Z

A lot of things have changed on the parallelization side and this PR is superseded by #16439. I'm closing it.

Parallelization _update_cdnmf_fast

0f261ad

GaelVaroquaux reviewed Apr 8, 2016
View reviewed changes

ogrisel mentioned this pull request Oct 12, 2016

A plan to safely support OpenMP in our Cython code base #7650

Closed

jeremiedbb requested changes Jun 17, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master'

6c16589

jeremiedbb mentioned this pull request Feb 13, 2020

[WIP] PERF Parallelize W/H updates of NMF with OpenMP #16439

Open

jeremiedbb added the Superseded PR has been replace by a newer PR label Feb 13, 2020

github-actions bot added the module:decomposition label Mar 2, 2020

Base automatically changed from master to main January 22, 2021 10:49

thomasjpfan added the cython label Apr 13, 2021

cmarmo closed this May 2, 2022

		from cython.parallel import prange


		cdef inline double _update_cdnmf_samples(unsigned n_components,double xht, double [] HHt,double[] W,double hess,unsigned t) nogil:

Uh oh!

[MRG] Parallelization _update_cdnmf_fast , fast nmf #6641

[MRG] Parallelization _update_cdnmf_fast , fast nmf #6641

Uh oh!

Conversation

macg0406 commented Apr 8, 2016

Uh oh!

dohmatob commented Apr 8, 2016 • edited by TomDLT Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux commented Apr 8, 2016 via email

Uh oh!

mblondel commented Apr 15, 2016

Uh oh!

ogrisel commented Oct 5, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Oct 5, 2016

Uh oh!

mblondel commented Oct 6, 2016

Uh oh!

ogrisel commented Oct 12, 2016

Uh oh!

rth commented Jun 14, 2019

Uh oh!

jeremiedbb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeremiedbb commented Jun 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremiedbb commented Jun 20, 2019

Uh oh!

GaelVaroquaux commented Jun 20, 2019 via email

Uh oh!

TomDLT commented Jun 20, 2019

Uh oh!

jeremiedbb commented Oct 21, 2019

Uh oh!

cmarmo commented May 2, 2022

Uh oh!

Uh oh!

dohmatob commented Apr 8, 2016 •

edited by TomDLT

Loading

ogrisel commented Oct 5, 2016 •

edited

Loading

jeremiedbb left a comment •

edited

Loading

jeremiedbb commented Jun 17, 2019 •

edited

Loading