[WIP] NMF estimator based on the Lee and Seung algorithm #2540

larsmans · 2013-10-20T21:49:38Z

Based on code by @vene. Needs ~~tests~~, ~~docs~~. Benchmark needs to be updated, must check usage of NMF class in the scikit.

Question: does it make sense at all to bootstrap this thing using NNDSVD?

vene · 2013-10-20T23:09:35Z

I think the initialization thing is not just about speed of
convergence, but actually about where the algo converges to, as the
problem is non-convex. You can check the reconstruction error of for
different initialization methods.

I will take a look at this exciting PR soon :)

On Sun, Oct 20, 2013 at 11:49 PM, Lars Buitinck
notifications@github.com wrote:

Based on code by @vene. Needs tests, docs. Benchmark needs to be updated,
must check usage of NMF class in the scikit.

Question: does it make sense at all to bootstrap this thing using NNDSVD?

You can merge this Pull Request by running

git pull https://github.com/larsmans/scikit-learn lee-seung-nmf

Or view, comment on, or merge it at:

#2540

Commit Summary

ENH NMF estimator based on the Lee and Seung algorithm

File Changes

M sklearn/decomposition/nmf.py (367)

Patch Links:

https://github.com/scikit-learn/scikit-learn/pull/2540.patch
https://github.com/scikit-learn/scikit-learn/pull/2540.diff

amueller · 2013-10-21T01:40:43Z

I really thought that one would be slower... we should really test for a regression in the projected gradient one...

mblondel · 2013-10-21T07:07:43Z

For NMF by the multiplicative method with KL divergence, see #1348

larsmans · 2013-10-21T08:01:59Z

Looks like that PR can be merged into this one as a loss argument to NMF. I'll see if I can find time to have a shot at it.

ogrisel · 2013-10-21T12:00:43Z

Would be interesting to experiment with a mini-batch SGD variant as well. It might even be faster that the batch gradient method and furthermore could accept a partial_fit method to make it possible train out-of-core topic models.

However I am not sure whether / how the mini-batch size will impact the convergence.

larsmans · 2013-10-21T12:13:51Z

@ogrisel My thought exactly. Wikipedia has a page about online NMF with a few references, I wanted to read those before hacking up my own online version.

vene · 2013-10-21T12:16:51Z

How about putting this on the pycon.fr sprint list?

vene · 2013-10-21T12:17:36Z

unless of course @larsmans uses his magic and finishes it up first ;)

On Mon, Oct 21, 2013 at 2:16 PM, Vlad Niculae vlad@vene.ro wrote:

How about putting this on the pycon.fr sprint list?

larsmans · 2013-10-21T12:31:22Z

When's that?

ogrisel · 2013-10-21T12:33:19Z

How about putting this on the pycon.fr sprint list?

I would be glad trying to hack on this at pycon.fr even if @larsmans ends up implementing the final version. It's good to have several people in the project having a good practical understanding of the algos.

ogrisel · 2013-10-21T12:35:53Z

When's that?

Next week, Monday and Tuesday in the beautiful city of Strasbourg: http://www.pycon.fr/2013/

larsmans · 2013-10-21T12:41:47Z

Yeah, I'd googled it :)

I'd very much appreciate any work on the NMF solvers; they can solve interesting text mining problems if they're fast enough.

ogrisel · 2013-10-21T12:45:04Z

+1 also sparse online solvers for NMF also looks like an interesting building block for recsys. Dense online NMF could be useful for signal processing like audio classifcation / decomposition from STFT data.

mblondel · 2013-10-21T13:45:19Z

The scikit is starting to become a bit inconsistent in the way we create classes.

one model, many learning algorithms (e.g., LinearSVC, LogisticRegression, Ridge)
one learning algorithm, many models / objectives (e.g. SGDClassifier)
one model / objectve, one learning algorithm (e.g., Lasso)

I tend to prefer the first (the model parameters are the same, only the way we fit them changes) although from a code organization point of view the second can be more convenient.

I would like to leave the door open to add a coordinate descent solver for NMF. All major solvers (projected gradient, multiplicative method, CD) support both KL and squared losses, it's just a matter of implementing them.

larsmans · 2013-10-21T13:58:43Z

I've noticed this before, but then the machine learning community is inconsistent and sloppy in distinguishing problems, models and algorithms. IMHO, we should make pragmatic choices with the number of hyperparameters (and valid combinations of them) as a guideline, so as not to bother users with parameters that don't do anything. I've set up this NMF as a separate class without all the extra parameters that PG-NMF has.

mblondel · 2013-10-21T14:06:40Z

MHO, we should make pragmatic choices with the number of hyperparameters (and valid combinations of them) as a guideline

Sounds good.

Would you care to rename NMF to MultiplicativeNMF?

larsmans · 2013-10-21T14:18:16Z

Good idea. But then what do we do with the current NMF synonym for ProjectedGradientNMF? Deprecate it? Leave it in?

GaelVaroquaux · 2013-10-21T14:29:29Z

• one model, many learning algorithms (e.g., LinearSVC, LogisticRegression, Ridge)

I tend to prefer the first

So do I, however,

(the model parameters are the same, only the way we fit them changes)

that's the limit: sometimes the algorithms conditions a lot the
parameters. That's when I would break up in different objects.

GaelVaroquaux · 2013-10-21T14:30:10Z

MHO, we should make pragmatic choices with the number of hyperparameters (and valid combinations of them) as a guideline

Sounds good.

I feel the same.

Would you care to rename NMF to MultiplicativeNMF?

Ouch, breakage again...

larsmans · 2013-10-21T14:33:09Z

No breakage. This PR actually breaks some code by reusing the NMF identifier for the new estimator. It's currently a synonym for the other estimator, see decomposition/nmf.py, near the bottom.

mblondel · 2013-10-21T14:40:01Z

Deprecate it? Leave it in?

I would deprecate it.

GaelVaroquaux · 2013-10-21T16:18:32Z

I would deprecate it.

Yes but... It's really tiring to know that scikit-learn is not reliable
it the mid term (6 month to a year) in terms of forward compatibility. We
will have to stop doing this. We are breaking people's code for
cosmetic reasons. It is fine when we are a young project focusing on
getting guidelines right, but it's not fine for a mature project.

larsmans · 2013-10-21T17:49:12Z

I'll leave it in for this PR but remove all references to it in the documentation. We were never consistent about addressing that class, sometimes calling it NMF, sometimes ProjectedGradientNMF.

ogrisel · 2013-10-21T17:53:35Z

Yes but... It's really tiring to know that scikit-learn is not reliable it the mid term (6 month to a year) in terms of forward compatibility. We will have to stop doing this. We are breaking people's code for cosmetic reasons. It is fine when we are a young project focusing on getting guidelines right, but it's not fine for a mature project.

We could also just increase the length of the deprecation cycle to 1 or 2 years instead of 2 releases to transition smoothly from young project to mature project while not freezing API or naming mistakes in stone.

larsmans · 2013-10-21T18:07:44Z

Docs done, now for the tests.

larsmans · 2013-10-21T19:52:01Z

Hm. Next problem is that some of the tests are quite unstable. By inserting more tests, the RNG's state is changed and some of the PG-NMF tests start failing.

ogrisel · 2013-10-21T19:57:13Z

We should really seed each test explicitly to make them independent of the running order.

larsmans · 2013-10-21T20:08:23Z

I've done that to the tests where it's really a problem. I haven't run all of them indivually, but the rest seem stable. Tests done, next up: check all examples that use NMF.

mblondel · 2013-10-22T01:54:07Z

Yes but... It's really tiring to know that scikit-learn is not reliable it the mid term (6 month to a year) in terms of forward compatibility. We will have to stop doing this. We are breaking people's code for cosmetic reasons. It is fine when we are a young project focusing on getting guidelines right, but it's not fine for a mature project.

Indeed, but do we want to keep two different names to refer to the same class?

mblondel · 2013-10-25T13:18:11Z

Sorry I thought you meant the projection to the positive semidefinite cone (the set of all positive semidefinite matrices), which is obviously not the same as the non-negative orthant.

vene · 2013-10-25T13:57:32Z

Something that keeps bugging me in NMF is regularization/sparsity constraints. IIRC the literature says that while the unregularized formulation is naturally somewhat sparse for nice datasets, it's usually needed to enforce this.

As far as I know the multiplicative update method cannot handle this, right?

The implementation is PG-NMF is currently very hackish and uses hstacks to adjust the cost function (sorry, it was some of my first numerical code!) and I also think it causes some slowdown and wasted memory.

It would be nice to hear some experience from the field with this re: L2 or L1 regularization.

mblondel · 2013-10-25T15:43:42Z

I think it's always a good idea to use regularization. In addition to making solutions sparser, it also makes the problem better conditioned. The projected gradient update without regularization is
W = max(W - step_size * gradient, 0)

Now, if you use alpha * ||W||_1 (i.e. L1 norm) as regularization, the gradient of the regularization is a matrix with alpha everywhere so the update becomes
W = max(W - step_size * gradient - step_size * alpha, 0)

So it's easy to see that it is more likely that weights get truncated to zero.

I cannot comment for the multiplicative method as I don't know the method well.

@vene

Moved the code by @vene from benchmarks/bench_plot_nmf.py to sklearn.decomposition.nmf.

larsmans · 2013-10-27T12:03:15Z

Ok, I could do with some help here. I can't seem to get the common tests to pass since (1) they pick up _BaseNMF even when there's an __all__ in the nmf.py file, and (2) the test that checks whether fit(X).transform(X) == fit_transform(X) fails for MultiplicativeNMF but not ProjectedGradientNMF. I don't understand this: my gut feeling is that it should fail in both cases or in neither, because they're doing the same thing.

I wrote a custom MultiplicativeNMF.transform that uses the multipllicative update algorithm with a fixed H, but this doesn't solve the problem.

coveralls · 2013-10-27T12:11:41Z

Coverage remained the same when pulling 670cd6b on larsmans:lee-seung-nmf into 101f0ee on scikit-learn:master.

larsmans · 2013-10-27T12:41:11Z

@mblondel The L1-regularized version of the multiplicative algorithm is given by Hoyer 2002.

vene · 2013-10-28T09:09:33Z

sklearn/decomposition/nmf.py

+            W *= update
+            maxW = np.max(update)
+
+            if i % 10 == 0:


Isn't the 10 here pretty arbitrary? The max can be slow for large dense matrices, right?

Yes, it's arbitrary, and I guess it doesn't skip much since I moved the max outside the check to reuse the update variable (there were updateH and updateW before). I'd have to do some timing to figure out which one is better.

vene · 2013-10-29T10:08:00Z

An example that plots reconstruction error over time for projected gradient and the multiplicative method would be nice. This can be done by setting the number of iterations to 1 and call fit multiple times with the result of the last call as initialization for W and H. A callback mechanism would be nicer but we have never agreed on an API.

Unfortunately the current API for NMF doesn't support warm restarts. It would be trivial to add but should we?

vene · 2013-10-29T13:37:30Z

More updates on the text benchmark and faces benchmark: It turns out PG isn't so bad, but it's not fair to set the same tol for both algorithms as they measure different things. A higher tol for PG leads to very good solutions and is faster than multiplicative.

For a fair comparison we need to somehow adjust the convergence criteria to make it match. I think the PG-NMF convergence criteria is the messed up one, as you can see that with random init it sometimes "converges" to noise very quickly.

vene · 2013-10-29T14:12:10Z

I forgot to mention: the current snapshot of the benchmarks is ran against a local branch where I made the multiplicative nmf check for convergence at every iteration, and the final number of iterations is saved under self.n_iter_.

larsmans · 2013-10-29T14:47:00Z

@vene Do you think I should give up on this PR?

vene · 2013-10-29T14:58:23Z

Definitely not, but I think I found a nasty hidden bug in the projected
gradient stopping condition. I'm still digging and I think we should bench
on Movielens or some different scenario.

On Tue, Oct 29, 2013 at 3:47 PM, Lars Buitinck notifications@github.comwrote:

@vene https://github.com/vene Do you think I should give up on this PR?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2540#issuecomment-27308958
.

vene · 2013-10-29T15:00:38Z

Also from the benchmarks it seems that the multiplicative update is better
at the moment for sparse multidimensional data, but we need to figure out
how to properly compare the tols.

On Tue, Oct 29, 2013 at 3:58 PM, Vlad Niculae vlad@vene.ro wrote:

Definitely not, but I think I found a nasty hidden bug in the projected
gradient stopping condition. I'm still digging and I think we should bench
on Movielens or some different scenario.

On Tue, Oct 29, 2013 at 3:47 PM, Lars Buitinck notifications@github.comwrote:

@vene https://github.com/vene Do you think I should give up on this PR?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2540#issuecomment-27308958
.

mblondel · 2013-10-30T02:06:30Z

Unfortunately the current API for NMF doesn't support warm restarts. It would be trivial to add but should we?

The convergence plot would also be useful for checking correctness of the implementations. How did you generate the convergence plots in your notebooks?

vene · 2013-10-30T09:17:50Z

The "dumb" way: for max_iter in range(...).

On the other hand this allows you to use the stopping condition, otherwise
with warm restarts I guess you will keep doing that one iteration, or
anyway more changes to the code would be needed.

On Wed, Oct 30, 2013 at 3:06 AM, Mathieu Blondel
notifications@github.comwrote:

Unfortunately the current API for NMF doesn't support warm restarts. It
would be trivial to add but should we?

The convergence plot would also be useful for checking correctness of the
implementations. How did you generate the convergence plots in your
notebooks?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2540#issuecomment-27360069
.

mblondel · 2013-10-30T13:59:50Z

I think I had a similar problem in lightning. When you use warm-start, the first iteration is already pretty good. Since the stopping criterion is relatively to the first iteration, the stopping criterion therefore becomes too strict. One solution would be to store the first iteration's projected gradient norm in self.init_norm_.

mblondel · 2013-10-30T14:07:19Z

The "dumb" way: for max_iter in range(...).

Alright, I guess this is the easiest way to do it for now. We can always add warm-start later (the dumb way doesn't scale to large datasets).

vene · 2013-11-28T15:28:15Z

There is also this paper about a way to speed up convergence of KL divergence NMF by considering diagonalized Newton updates as well, and the author's (synthetic) benchmarks show it to be better than the coordinate descent version. This paper doesn't pass the scikit-learn criteria for inclusion, but I found it interesting nonetheless.

amueller · 2015-08-25T18:54:57Z

Close based on #4811?

ogrisel · 2015-08-27T14:14:32Z

Let's close and move the discussion to #4811.

larsmans mentioned this pull request Oct 20, 2013

PG-NMF performance is terrible #2537

Closed

larsmans added 3 commits October 26, 2013 16:32

COSMIT reuse extmath.norm in NMF

2d211b9

ENH multiplicative update algorithm for NMF

10776be

Moved the code by @vene from benchmarks/bench_plot_nmf.py to sklearn.decomposition.nmf.

transform for MultiplicativeNMF, disable common tests

670cd6b

vene reviewed Oct 28, 2013
View reviewed changes

larsmans force-pushed the master branch from 58a55ad to 4b82379 Compare August 25, 2014 21:50

MechCoder force-pushed the master branch from 6deaea0 to 3f49cee Compare November 3, 2014 12:36

TomDLT mentioned this pull request Jun 3, 2015

Discussion about adding NMF methods #4811

Closed

ogrisel closed this Aug 27, 2015

TomDLT mentioned this pull request Nov 19, 2015

[MRG+1] Add multiplicative-update solver in NMF, with all beta-divergence #5295

Merged

Uh oh!

[WIP] NMF estimator based on the Lee and Seung algorithm #2540

[WIP] NMF estimator based on the Lee and Seung algorithm #2540

Uh oh!

Conversation

larsmans commented Oct 20, 2013

Uh oh!

vene commented Oct 20, 2013

Uh oh!

amueller commented Oct 21, 2013

Uh oh!

mblondel commented Oct 21, 2013

Uh oh!

larsmans commented Oct 21, 2013

Uh oh!

ogrisel commented Oct 21, 2013

Uh oh!

larsmans commented Oct 21, 2013

Uh oh!

vene commented Oct 21, 2013

Uh oh!

vene commented Oct 21, 2013

Uh oh!

larsmans commented Oct 21, 2013

Uh oh!

ogrisel commented Oct 21, 2013

Uh oh!

ogrisel commented Oct 21, 2013

Uh oh!

larsmans commented Oct 21, 2013

Uh oh!

ogrisel commented Oct 21, 2013

Uh oh!

mblondel commented Oct 21, 2013

Uh oh!

larsmans commented Oct 21, 2013

Uh oh!

mblondel commented Oct 21, 2013

Uh oh!

larsmans commented Oct 21, 2013

Uh oh!

GaelVaroquaux commented Oct 21, 2013

Uh oh!

GaelVaroquaux commented Oct 21, 2013

Uh oh!

larsmans commented Oct 21, 2013

Uh oh!

mblondel commented Oct 21, 2013

Uh oh!

GaelVaroquaux commented Oct 21, 2013

Uh oh!

larsmans commented Oct 21, 2013

Uh oh!

ogrisel commented Oct 21, 2013

Uh oh!

larsmans commented Oct 21, 2013

Uh oh!

larsmans commented Oct 21, 2013

Uh oh!

ogrisel commented Oct 21, 2013

Uh oh!

larsmans commented Oct 21, 2013

Uh oh!

mblondel commented Oct 22, 2013

Uh oh!

mblondel commented Oct 25, 2013

Uh oh!

vene commented Oct 25, 2013

Uh oh!

mblondel commented Oct 25, 2013

Uh oh!

larsmans commented Oct 27, 2013

Uh oh!

coveralls commented Oct 27, 2013

Uh oh!

larsmans commented Oct 27, 2013

Uh oh!

vene Oct 28, 2013

Choose a reason for hiding this comment

Uh oh!