[MRG+1] refactor NMF and add CD solver #4852

TomDLT · 2015-06-11T15:08:32Z

This PR is a first part of what is discussed in #4811.
It includes:

refactor ProjectedGradientNMF into NMF, with a 'proj-grad' solver
include random initialization into _initialize_nmf()
add ElasticNet-like regularization (with parameters alpha and l1_ratio)
add a solver with Coordinate Descent (from @mblondel) with Cython for critical part
update some tests and bench_plot_nmf (which looks terrible btw)

In some future PR, I will:

include MM (cf comment) in order to include more loss functions (I-divergence and IS-divergence)
investigate more on initialization of NMF
...

Please tell me what you think :)

amueller · 2015-06-11T16:07:28Z

I think it's awesome! And travis is failing ;)

TomDLT · 2015-06-11T16:17:39Z

Thanks ! Only 2/3 are failing :p

TomDLT · 2015-06-11T17:10:34Z

Benchmark on 20news full dataset (sparse), no regularization, n_components=10

Benchmark on Olivetti faces full dataset (dense), no regularization, n_components=10

amueller · 2015-06-11T17:11:29Z

nice. I don't have time to review though.

mblondel · 2015-06-12T02:26:17Z

sklearn/decomposition/nmf.py

+        return _update_cdnmf_fast(Ht, WtW, WtX, alpha, l1_ratio, False)
+
+
+def fit_coordinate_descent(X, W, H, tol=1e-4, max_iter=200, alpha=0.001,


Use a private function.

In the discussion on NMF, you said:

Maybe we can keep the class API simple and expose a function with more options

This function add the choice to regularize W or H or both, whereas the class does not have this option, and regularizes both. Is it what you meant?

The idea was to do like in ridge.py with ridge_regression and Ridge (i.e. the public function is common for all solvers).

(i.e. the public function is common for all solvers).

Ok I'll do that

mblondel · 2015-06-12T02:31:36Z

I think we should remove projected gradient by v0.19. Otherwise, with your plan to add more loss functions, this will be way too much code to maintain. @vene, is it fine with you?

TomDLT · 2015-06-12T08:24:10Z

I think we should remove projected gradient by v0.19

I think that was the plan, I will add deprecation.

vene · 2015-06-12T14:49:43Z

Based on the plots above, that aren't surprising, yes, I fully agree that if we merge CD we can deprecate PG.

vene · 2015-06-12T23:33:56Z

sklearn/decomposition/nmf.py

+                for when sparsity is not desired)
+            'random': non-negative random matrices, scale with:
+                sqrt(X.mean() / n_components)
+            'uniform': matrices filled with the same value:


I don't see where the code path for 'uniform' currently is. Is 'uniform' supposed to be different than 'random'?
Also, @mblondel's gist used as initialization a random subset of the rows/cols of the input matrix, would this make sense? We could run some benchmarks for this too.

'uniform' is initialization by the same value everywhere, but I removed it in this PR.
I will do a specific PR to address the initialization schemes and run some benchmarks.
And I will also look at the random subset of rows/cols.

vene · 2015-06-15T14:49:31Z

sklearn/decomposition/nmf.py

+        raise ValueError('Array with wrong shape passed to %s. Expected %s, '
+                         'but got %s ' % (whom, shape, np.shape(A)))
+    check_non_negative(A, whom)
+    if np.max(A) == 0:


I remember in one of the papers they say that full rows/cols of zeros can cause problems, is it the case in this implementation?

With a full col/row of zeros in W or H, WtW or HHt will have a zero in its diagonal, which is a problem since we will divide by zero.
To avoid that, I added a small L2 regularization (1e-15), which avoid this problem.

I think this is not the right way to do it. In this case, the optimal solution is probably to not update the coefficient.

i.e., skip the current iteration

Ok I changed it

mblondel · 2015-06-18T12:18:00Z

sklearn/decomposition/cdnmf_fast.pyx

+                              int max_inner, double tol):
+    cdef double violation = 0.
+    cdef double pg
+    cdef int n_length = W.shape[0]


n_features?

W is (n_samples, n_components)
H.T is (n_features, n_components)
I changed the name to be more clear:
cdef int n_samples = W.shape[0] # n_features for H update

TomDLT · 2015-06-22T12:31:00Z

Benchmark of greedy coordinate descent (GCD) vs coordinate descent (CD).
20news, n_components=10

20news, n_components=30

faces, n_components=10

faces, n_components=30

(edit) Conclusion:
For 20news dataset, GCD and CD do not give the same minima, but they seems comparable.
For faces dataset, GCD is faster (n_components=10) or equal (n_components=30) to CD.

Results are not as good as expected but GCD seems a bit better than CD (as expected).
I will look at further speed improvement, and benchmark on a larger dataset.

agramfort · 2015-06-22T13:34:47Z

really nice just one thing, can you summarize your conclusions when you post figures? thanks

vene · 2015-06-22T14:39:27Z

I notice that the greedy curves are sometimes longer on the X axis, even if they reach the same loss. Is this because the convergence criterion fires later, or because iterations take longer? The paper seemed to advertise the greedy selection as being very cheap computationally.

TomDLT · 2015-06-22T14:49:36Z

I notice that the greedy curves are sometimes longer on the X axis, even if they reach the same loss. Is this because the convergence criterion fires later, or because iterations take longer?

Iterations take longer. I used the same number of iteration for CD and GCD, with a low tolerance so as to stop only when maximum number of iterations is reached.

mblondel · 2015-06-22T15:04:26Z

The improvements are not so impressive overall (~1s gain). The author released the source code in MATLAB (computationally expensive parts written in C). You could have a look to check potential tricks.

In the lower-right plot in the n_components=10 case (news20), the loss difference looks very bad. I wonder if it's not just a scale problem.

mblondel · 2015-06-22T15:11:49Z

sklearn/decomposition/nmf.py

+    l2_reg = (1 - l1_ratio) * alpha
+
+    # L2 regularization corresponds to increase the diagonal of HHt
+    HHt = np.dot(Ht.T, Ht) + np.eye(n_components) * l2_reg


I noticed that you used fast_dot in some places. You could potentially use it here too.

It was to be sure to have C-contiguous array, but as I removed the check in Cython, I will change it.

agramfort · 2015-09-20T09:02:05Z

doc/modules/decomposition.rst

-:class:`NMF` can also be initialized with random non-negative matrices, by
-passing an integer seed or a ``RandomState`` to :attr:`init`.
+:class:`NMF` can also be initialized with correctly scaled random non-negative
+matrices by setting :attr:`init="random"`. An integer seed or a ``RandomState`` can also be passed to :attr:`random_state` to control reproducibility.


line too long

agramfort · 2015-09-20T09:39:58Z

that's it for me

don't forget what's new page

nice job @TomDLT !

mblondel · 2015-09-20T16:13:57Z

Sorry to be late on this but could we address the scaling of alpha in another PR? I agree that scaling of alpha could be useful but I think it should probably be an option (default: False) for backward compatibility and consistency with the rest of the scikit. Also, the motivation for the way it is currently scaled is not completely obvious to me. My first intuition would have been to divide the loss term by n_samples x n_features. Although @TomDLT has some early experiments, it would be nice to take our time to compare different scalings.

TomDLT · 2015-09-21T09:32:06Z

Comments addressed, thanks.

I also removed the alpha scaling from this PR, as suggested by @mblondel, and I will open a new PR for further discussion about it.

agramfort · 2015-09-21T17:30:34Z

+1 for merge on my side.

mblondel · 2015-09-21T17:38:39Z

+1 too

[MRG+1] refactor NMF and add CD solver

amueller · 2015-09-22T00:39:48Z

🍻 Thanks @TomDLT and @mblondel this is awesome!

agramfort · 2015-09-22T05:46:04Z

🍻

TomDLT · 2015-09-22T08:30:27Z

Cool 🍻 !

amueller · 2015-11-09T04:46:27Z

sorry if I overlooked it but did you commit / post your benchmark script using the "real" datasets? the current one only uses synthetic data, right?

TomDLT · 2015-11-09T09:26:14Z

I benchmarked it with "real" datasets I think: 20 newsgroups, Olivetti Faces, RCV1 and MNIST
My script is available here, but is not on scikit-learn.

I don't find benchmarks/bench_plot_nmf.py very useful by the way.
We could maybe change it with a shorter version of my script.

amueller · 2015-11-09T16:21:50Z

I think that would be great. Would you be interested to do that? Otherwise maybe open an issue?

TomDLT · 2015-11-09T16:47:43Z

I can do that

TomDLT · 2015-11-10T10:30:46Z

see #5779

TomDLT force-pushed the nmf branch from 26896c4 to 0ebb829 Compare June 11, 2015 15:37

amueller mentioned this pull request Jun 11, 2015

[MRG] Add KNN strategy for imputation #4844

Closed

mblondel reviewed Jun 12, 2015
View reviewed changes

TomDLT force-pushed the nmf branch from b20133c to a8393e8 Compare June 12, 2015 14:21

vene reviewed Jun 12, 2015
View reviewed changes

TomDLT force-pushed the nmf branch from 3269e6c to fe47031 Compare June 15, 2015 08:27

vene reviewed Jun 15, 2015
View reviewed changes

TomDLT force-pushed the nmf branch from 0d3c5b0 to 76b4fa8 Compare June 18, 2015 11:51

mblondel reviewed Jun 18, 2015
View reviewed changes

TomDLT force-pushed the nmf branch 2 times, most recently from f7041cc to ecd0415 Compare June 22, 2015 09:42

mblondel reviewed Jun 22, 2015
View reviewed changes

agramfort reviewed Sep 20, 2015
View reviewed changes

TomDLT force-pushed the nmf branch from 08c0aab to 70e488e Compare September 21, 2015 09:28

ENH refactor NMF and add CD solver

ceeef70

TomDLT force-pushed the nmf branch from 70e488e to ceeef70 Compare September 21, 2015 09:42

agramfort changed the title ~~[MRG] refactor NMF and add CD solver~~ [MRG+1] refactor NMF and add CD solver Sep 21, 2015

amueller added a commit that referenced this pull request Sep 22, 2015

Merge pull request #4852 from TomDLT/nmf

db52aac

[MRG+1] refactor NMF and add CD solver

amueller merged commit db52aac into scikit-learn:master Sep 22, 2015

This was referenced Sep 22, 2015

Discussion about adding NMF methods #4811

Closed

[WIP] FIX Projected Gradient NMF stopping condition #2557

Closed

This was referenced Sep 22, 2015

[MRG+1] Add multiplicative-update solver in NMF, with all beta-divergence #5295

Merged

[MRG] Add scaling to alpha regularization parameter in NMF #5296

Closed

TomDLT mentioned this pull request Nov 10, 2015

[MRG+1] Improve benchmark on NMF #5779

Merged

TomDLT deleted the nmf branch December 20, 2016 14:25

TomDLT mentioned this pull request Oct 1, 2020

NMF default init with 'mu' solver #18505

Closed

		return _update_cdnmf_fast(Ht, WtW, WtX, alpha, l1_ratio, False)


		def fit_coordinate_descent(X, W, H, tol=1e-4, max_iter=200, alpha=0.001,

[MRG+1] refactor NMF and add CD solver #4852

[MRG+1] refactor NMF and add CD solver #4852

Conversation

TomDLT commented Jun 11, 2015

amueller commented Jun 11, 2015

TomDLT commented Jun 11, 2015

TomDLT commented Jun 11, 2015

amueller commented Jun 11, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mblondel commented Jun 12, 2015

TomDLT commented Jun 12, 2015

vene commented Jun 12, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomDLT commented Jun 22, 2015

agramfort commented Jun 22, 2015 via email

vene commented Jun 22, 2015

TomDLT commented Jun 22, 2015

mblondel commented Jun 22, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agramfort commented Sep 20, 2015

mblondel commented Sep 20, 2015

TomDLT commented Sep 21, 2015

agramfort commented Sep 21, 2015

mblondel commented Sep 21, 2015

amueller commented Sep 22, 2015

agramfort commented Sep 22, 2015 via email

TomDLT commented Sep 22, 2015

amueller commented Nov 9, 2015

TomDLT commented Nov 9, 2015

amueller commented Nov 9, 2015

TomDLT commented Nov 9, 2015

TomDLT commented Nov 10, 2015