Draft implementation of MiniBatchNMF #13326

pcerda · 2019-02-28T09:28:44Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Implementation of a mini-batch NMF with multiplicative updates.
#13308

Any other comments?

amueller · 2019-02-28T11:13:14Z

In the paper the mini batch algorithms is actually not that much faster surprisingly. That's kind of odd.
Generally adding a new class for something with 70 citation is not usually something we do.

GaelVaroquaux · 2019-02-28T11:15:21Z

The paper is the following:

Online algorithms for nonnegative matrix factorization with the Itakura-Saito divergence, A Lefevre, F Bach, C Févotte

GaelVaroquaux

A bunch of cosmetic comments.

sklearn/decomposition/minibatch_nmf.py

GaelVaroquaux · 2019-02-28T12:52:17Z

sklearn/decomposition/minibatch_nmf.py

+            self.W, self.A, self.B = self._init_W(X)
+            # self.rho = self.r**(self.batch_size / n_samples)
+        # else:
+            # not implemented yet


We'll have to implement for dense.

sklearn/decomposition/minibatch_nmf.py

GaelVaroquaux · 2019-02-28T12:58:00Z

sklearn/decomposition/minibatch_nmf.py

+        return H
+
+
+def mini_batch(iterable1, iterable2, n=1):


Can you use sklearn.utils.gen_batches instead of this function?

GaelVaroquaux · 2019-02-28T13:01:35Z

sklearn/decomposition/minibatch_nmf.py

+        Ht_W = _special_sparse_dot(Ht, W, Vt)
+        Ht_W_data = Ht_W.data
+        np.divide(Vt.data, Ht_W_data, out=Ht_W_data, where=(Ht_W_data != 0))
+        self.rho = self.r ** (1 / (iter + 1))


Any parameter that is not specified by the init, but rather a consequence of the fit should have a name that ends with _: self.rho -> self.rho_. It should also be the case of many other variables in this object.

sklearn/decomposition/minibatch_nmf.py

GaelVaroquaux · 2019-02-28T13:04:01Z

sklearn/decomposition/minibatch_nmf.py

+            # not implemented yet
+
+        n_batch = (n_samples - 1) // self.batch_size + 1
+        self.iter = 1


This variable should probably be named "self.n_iter_"

GaelVaroquaux · 2019-02-28T13:04:47Z

sklearn/decomposition/minibatch_nmf.py

+                if i == n_batch-1:
+                    W_change = np.linalg.norm(
+                        self.W - W_last) / np.linalg.norm(W_last)
+            if (W_change < self.tol) and (iter >= self.min_iter - 1):


"iter" should be renamed to "n_iter" as "iter" is a function in Python.

GaelVaroquaux · 2019-02-28T13:06:11Z

Can you also start working on tests.

GaelVaroquaux · 2019-02-28T17:15:07Z

@pcerda : can you add your benchmarking code. We'll need it to profile / benchmark.

…ent) and benchmart file (WIP)

banilo · 2019-03-02T11:45:07Z

sklearn/decomposition/minibatch_nmf.py

+    ngram_range: tuple, default=(2, 4)
+
+    init: str, default 'k-means++'
+        Initialization method of the W matrix.


perhaps add 2-3 words on what the W matrix is in the NMF formulation

banilo · 2019-03-02T11:46:05Z

sklearn/decomposition/minibatch_nmf.py

+
+    Parameters
+    ----------
+


Isn't an empty line here against convention ?

banilo · 2019-03-02T11:54:07Z

sklearn/decomposition/minibatch_nmf.py

+
+    def __init__(self, n_components=10, batch_size=512,
+                 r=.001, init='k-means++',
+                 tol=1E-4, min_iter=2, max_iter=5, ngram_range=(2, 4),


tol does not match the default mentioned in docstring

banilo · 2019-03-02T12:08:38Z

sklearn/decomposition/minibatch_nmf.py

+            Ht_out = Ht * safe_sparse_dot(Ht_W, W_WT1.T)
+            squared_norm = np.linalg.norm(
+                Ht_out - Ht) / (np.linalg.norm(Ht) + 1E-10)
+            Ht[:] = Ht_out


Is the [:] to make sure that the same Ht object is used in each iteration ?

banilo · 2019-03-02T12:11:10Z

sklearn/decomposition/minibatch_nmf.py

+                self.W_, self.A_, self.B_ = self._m_step(
+                    X[slice], self.W_, self.A_, self.B_, H[slice], self.iter)
+                self.iter += 1
+                if i == n_batch-1:


PEP8 ? My feel is to add spaces around "-"

banilo · 2019-03-02T12:13:19Z

sklearn/decomposition/minibatch_nmf.py

+
+        Parameters
+        ----------
+        X : array-like (str), shape [n_samples,]


n_feature missing in [] bracket ?

banilo · 2019-03-02T12:14:08Z

sklearn/decomposition/nmf.py

-                break
-            previous_error = error
+            if beta_loss < 1:
+                W[slice][W[slice] < np.finfo(np.float64).eps] = 0.


consider splitting into two lines for ++readability

banilo · 2019-03-02T12:16:04Z

sklearn/decomposition/nmf.py

@@ -1297,6 +1328,37 @@ def fit(self, X, y=None, **params):
        self.fit_transform(X, **params)
        return self

+    def partial_fit(self, X, y=None, **params):


TODO: add docstring

amueller · 2019-08-06T16:15:27Z

What's the relationship between this PR and #13386?

GaelVaroquaux · 2019-08-06T16:20:33Z

I think that #13386 is a second design that we thought was cleaner. @pcerda, can you confirm.

We'll need to finish this, by the way :). Maybe @TwsThomas will be able to pitch in.

ogrisel · 2022-01-05T15:20:22Z

Closing in favor of #16948.

CERDA REYES Patricio added 2 commits February 28, 2019 10:27

draft implementation of MiniBatchNMF

cf062b1

moving file to decomposition folder

d8ee945

GaelVaroquaux reviewed Feb 28, 2019

View reviewed changes

CERDA REYES Patricio added 3 commits February 28, 2019 16:18

remove hashing parameters of ancient code

5a30f4b

change self.n_vocab to self.n_features_

705f9e5

self.W to self.W_

2a56a14

add mofidied nmf class for online nmf (only kl divergence for the mom…

a054663

…ent) and benchmart file (WIP)

banilo reviewed Mar 2, 2019

View reviewed changes

amueller added the Superseded PR has been replace by a newer PR label Aug 12, 2019

github-actions bot added the module:decomposition label Mar 2, 2020

cmarmo mentioned this pull request Apr 17, 2020

[MRG] Online implementation of non-negative matrix factorization #16948

Merged

Base automatically changed from master to main January 22, 2021 10:50

ogrisel closed this Jan 5, 2022

Uh oh!

Draft implementation of MiniBatchNMF #13326

Draft implementation of MiniBatchNMF #13326

Uh oh!

Conversation

pcerda commented Feb 28, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

amueller commented Feb 28, 2019

Uh oh!

GaelVaroquaux commented Feb 28, 2019

Uh oh!

GaelVaroquaux left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux Feb 28, 2019 • edited by TomDLT Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux commented Feb 28, 2019

Uh oh!

GaelVaroquaux commented Feb 28, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Aug 6, 2019

Uh oh!

GaelVaroquaux commented Aug 6, 2019

Uh oh!

ogrisel commented Jan 5, 2022

Uh oh!

Uh oh!

GaelVaroquaux Feb 28, 2019 •

edited by TomDLT

Loading