[MRG] Online implementation of non-negative matrix factorization #16948

cmarmo · 2020-04-17T13:19:03Z

Reference Issues/PRs

Continues #13386
Aim to fix #13308, fix #13326.

What does this implement/fix? Explain your changes.

Implement Online non-negative matrix factorization, following
Online algorithms for nonnegative matrix factorization with the Itakura-Saito divergence, A Lefevre, F Bach, C Févotte, 2011.

sklearn/decomposition/_nmf.py

cmarmo · 2020-08-30T13:50:04Z

@GaelVaroquaux , @jeremiedbb this is still WIP but I feel like its status needs an update.
I have rerun benchmarks for commit 0020eb6 , the result is below

With respect to the previous implementation:

I am no longer hardcoding the number of iterations for the minibatch algorithm: the convergence time is still better than for the classical NMF;
I have introduced the forgetting factor: a 'weight' to be applied to old batches (hardcoded for now).

I have also checked that the minibatch algorithm gives the same results as the single batch one, when the batch size is set to the number of samples (I have added a test for that).

From the plot, the loss is greater in the minibatch implementation, but in some cases it seems to be comparable... I am planning to investigate the role of the forgetting factor on the loss: from Fevotte et al. it seems that this factor and then a good solution, depend on the number of samples and the batch size.

Here what is still needed for this PR:

investigate the role of the forgetting factor in the loss estimation (~~and improve it, hopefully~~ it turns out that the expected loss should be investigated instead).
make the forgetting factor a parameter and suggest possible optimizations depending on the dimensions of the problem
avoid code duplication (non_negative_factorization and non_negative_factorization_online are quite the same function right now)
improve tests (I am open to suggestions for testing those lines)
write documentation (!)

Thanks for listening and let me know if you have any comment or suggestion.

Co-authored-by: Patricio Cerda <pcerda>

adrinjalali

I did not check the math, the rest looks pretty good.

doc/whats_new/v1.1.rst

sklearn/decomposition/_nmf.py

adrinjalali · 2022-04-07T14:03:09Z

sklearn/decomposition/_nmf.py

+        return self
+
+    def _solve_W(self, X, H, max_iter):
+        """Minimize the objective function w.r.t W"""


it would make it easier for somebody like me to read/review if we could explain what these methods do in the docstrings.

I tried to make it more explicit. Let me know what you think

adrinjalali · 2022-04-07T14:04:38Z

sklearn/decomposition/_nmf.py

+        return W
+
+    def partial_fit(self, X, y=None, W=None, H=None):
+        """Update the model using the data in X as a mini-batch.


could we explain how a user could use partial_fit on a dataset to get the same result as running fit on its entirety? It would make it easier for people to decide how to use it.

I improved the docstring and linked to the doc of incremental learning. Maybe we could add a section there to explain with more details how to use partial_fit, but I think it should be done in a separate PR since it concerns many estimators.

glemaitre

Just posted the comment that I had from before. You can discard the comment that could be outdated now.

sklearn/decomposition/_nmf.py

glemaitre · 2022-04-07T13:53:09Z

sklearn/decomposition/_nmf.py

+        self._check_params(X)
+
+        if X.min() == 0 and self._beta_loss <= 0:
+            raise ValueError(


Apparently, we don't check for this error in the test as well.

Added a test

sklearn/decomposition/_nmf.py

glemaitre

A review mainly about nitpicks on the documentation just for the format.

doc/modules/decomposition.rst

glemaitre · 2022-04-20T12:12:09Z

sklearn/decomposition/_nmf.py

+
+        .. math::
+
+            0.5 * ||X - WH||_{loss}^2


I don't know if . would be better than * for the mulitplication.

We use * everywhere. Let's keep consistency

sklearn/decomposition/_nmf.py

glemaitre

LGTM once the doc nitpicks are included.

glemaitre · 2022-04-20T13:57:45Z

sklearn/decomposition/tests/test_nmf.py

+    batch_size = 3
+    max_iter = 1000
+
+    rng = np.random.mtrand.RandomState(42)


Just a question here: do we want to add support for the global random state fixture?

Given the mess it caused so far, I'd rather do that very carefully in a follow up PR :)

examples/applications/plot_topics_extraction_with_nmf_lda.py

glemaitre

LGTM.

…o/scikit-learn into pr/cmarmo/16948

jeremiedbb · 2022-04-22T15:53:50Z

Thanks @cmarmo !

glemaitre · 2022-04-22T15:56:56Z

Thanks everyone.

jjerphan · 2022-04-22T16:18:27Z

Thanks to everyone involved.

…t-learn#16948) Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org> Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

cmarmo mentioned this pull request Apr 17, 2020

WIP Modifying nmf.py for accepting mini-batches #13386

Closed

github-actions bot added the module:decomposition label Apr 17, 2020

GaelVaroquaux reviewed Jun 25, 2020

View reviewed changes

sklearn/decomposition/_nmf.py Outdated Show resolved Hide resolved

GaelVaroquaux reviewed Jun 25, 2020

View reviewed changes

sklearn/decomposition/_nmf.py Outdated Show resolved Hide resolved

cmarmo added 22 commits August 3, 2020 14:03

Merge branch 'master' into modified_nmf_for_minibatch

3694255

Sum batch iterations to iterations.

97082c7

Debugging.

5cc9949

Merge branch 'master' into modified_nmf_for_minibatch

e852ad2

Debug

7d75d30

Some improvements.

753e6f6

Add hardcoded forgetting factor.

cd28014

Sync with upstream.

88fc02c

Fix index.

d5ad09a

Various testing.

6b8969f

Same results for NMF and onlineNMF for batch_size=n_samples.

2a7d316

Merge branch 'master' into modified_nmf_for_minibatch

09e50e3

Linting.

172d097

Linting in benchmarks.

921bd33

Fix number of iterations.

03867c2

Clean parameters.

f58900c

Remove transform and inverse_transform function.

e2be821

Fix references.

0020eb6

Add tests.

05d6010

Fix lint errors in tests.

8c7a3fb

Add one more test.

e4c1e23

Fix import.

6b930d9

cmarmo added 3 commits August 31, 2020 10:56

Merge branch 'master' into modified_nmf_for_minibatch

4c8bf0a

Remove duplicated code.

8f54700

Lint.

6b99b95

glemaitre self-requested a review March 22, 2022 15:09

jeremiedbb added 2 commits March 25, 2022 15:26

address comments

dad2eb2

credit pcerda

a027686

Co-authored-by: Patricio Cerda <pcerda>

adrinjalali reviewed Apr 7, 2022

View reviewed changes

jeremiedbb added 6 commits April 8, 2022 13:42

update what's new entry

ce646d7

test beta_loss > 2

616f9ba

improve solve_W docstring

b6681f8

improve partial_fit docstring

0922eb3

don't introduce new warnings in tests

051fa8e

lint

0094d4f

glemaitre reviewed Apr 20, 2022

View reviewed changes

glemaitre approved these changes Apr 20, 2022

View reviewed changes

jeremiedbb and others added 6 commits April 21, 2022 13:15

Merge remote-tracking branch 'upstream/main' into pr/cmarmo/16948

7978db1

address review comments

a7ef482

lint

c77de85

fix position in what's new

e2510ec

better format obj function in docstring

3ecf370

Merge branch 'main' into modified_nmf_for_minibatch

15ead2e

glemaitre reviewed Apr 22, 2022

View reviewed changes

examples/applications/plot_topics_extraction_with_nmf_lda.py Show resolved Hide resolved

glemaitre approved these changes Apr 22, 2022

View reviewed changes

jeremiedbb added 3 commits April 22, 2022 16:21

Merge remote-tracking branch 'upstream/main' into pr/cmarmo/16948

9abf0c9

avoid convergence warning in example

5790e5f

Merge branch 'modified_nmf_for_minibatch' of https://github.com/cmarm…

99ec76c

…o/scikit-learn into pr/cmarmo/16948

glemaitre merged commit 69132eb into scikit-learn:main Apr 22, 2022

cmarmo deleted the modified_nmf_for_minibatch branch April 22, 2022 20:33

Uh oh!

[MRG] Online implementation of non-negative matrix factorization #16948

[MRG] Online implementation of non-negative matrix factorization #16948

Uh oh!

Conversation

cmarmo commented Apr 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

Uh oh!

Uh oh!

cmarmo commented Aug 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

jeremiedbb commented Apr 22, 2022

Uh oh!

glemaitre commented Apr 22, 2022

Uh oh!

jjerphan commented Apr 22, 2022

Uh oh!

Uh oh!

cmarmo commented Apr 17, 2020 •

edited

Loading

cmarmo commented Aug 30, 2020 •

edited

Loading