Skip to content

[MRG+1] Improve benchmark on NMF #5779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 20, 2016
Merged

Conversation

TomDLT
Copy link
Member

@TomDLT TomDLT commented Nov 10, 2015

Previous benchmark used simulated data, did not use the new coordinate descent solver (#4852), and I found the plot very uninformative:
scikit-learn_non-negative_matrix_factorizationbenchmark_results
scikit-learn_non-negative_matrix_factorizationbenchmark_results2


This new benchmark tests NMF on two datasets:

  • 20 newsgroup dataset: sparse, shape(11314, 39116)
  • Olivetti faces dataset: dense, shape(400, 4096)

It uses three different solvers:

and three different initialization schemes:

  • random
  • NNDSVD
  • NNDSVDAR

The total running time is about 2 minutes for 20 Newsgroups dataset, and 1 minute for Olivetti faces dataset.

On the plots, each point corresponds to one more iteration.
figure_1
figure_2


This change is Reviewable


m = Memory(cachedir='.', verbose=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m --> mem maybe?

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This need a rebase + see my comments, both other than that +1 for merge.

report(norm(X - np.dot(W, H)), tend)

return timeset, err
@ignore_warnings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment to explain which kind of warnings do you expect to ignore here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a panda's indexing warning, I solved it and removed the ignore_warnings.



# use joblib to cache results.
# X_shape is specified in arguments for avoiding hashing X
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, how much overhead does the hashing of X adds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few seconds on each dataset, i.e. for ~200 calls to bench_one.
If I recall correctly, I added it mostly for larger datasets like MNIST or RCV1.

@TomDLT TomDLT force-pushed the bench_nmf branch 2 times, most recently from 57cdf03 to 38561e1 Compare September 19, 2016 14:47
@TomDLT
Copy link
Member Author

TomDLT commented Sep 29, 2016

Rebased and comments addressed.
Another review?

@TomDLT TomDLT changed the title [MRG] Improve benchmark on NMF [MRG+1] Improve benchmark on NMF Oct 4, 2016
@TomDLT
Copy link
Member Author

TomDLT commented Oct 5, 2016

Question: should we keep the projected-gradient solver in this benchmark, once it is removed from the package (in 0.19) ?

@amueller
Copy link
Member

amueller commented Oct 5, 2016

Depends on how much trouble it is to keep them around, I'd say. It's informative to have it in the benchmark.

@amueller
Copy link
Member

amueller commented Oct 5, 2016

lgtm

@jnothman
Copy link
Member

Well, I think you should drop ProjectedGradient now that we're in 0.19; and ideally we'd get #5295 merged, but otherwise this does have +2 already.

@tguillemot
Copy link
Contributor

@TomDLT Can you rebase and drop ProjectedGradient ?

@TomDLT
Copy link
Member Author

TomDLT commented Dec 16, 2016

Soon 😸

@TomDLT
Copy link
Member Author

TomDLT commented Dec 19, 2016

Update:

Copy link
Contributor

@tguillemot tguillemot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell you about _PGNMF.
LGTM anyway


###################
# Start of _PGNMF #
###################
Copy link
Contributor

@tguillemot tguillemot Dec 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you keep it maybe you can add a comment to say where comes from that codes and when it was removed of sklearn ?

@amueller
Copy link
Member

amueller commented Dec 19, 2016 via email

@jnothman jnothman merged commit 3f0af16 into scikit-learn:master Dec 20, 2016
@jnothman
Copy link
Member

Thanks for the clarity @TomDLT !

@TomDLT TomDLT deleted the bench_nmf branch December 20, 2016 14:19
sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017
* ENH improve benchmark on nmf

* add projected gradient solver inside the benchmark file

* add comments and authors for _PGNMF
@Przemo10 Przemo10 mentioned this pull request Mar 17, 2017
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
* ENH improve benchmark on nmf

* add projected gradient solver inside the benchmark file

* add comments and authors for _PGNMF
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
* ENH improve benchmark on nmf

* add projected gradient solver inside the benchmark file

* add comments and authors for _PGNMF
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
* ENH improve benchmark on nmf

* add projected gradient solver inside the benchmark file

* add comments and authors for _PGNMF
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
* ENH improve benchmark on nmf

* add projected gradient solver inside the benchmark file

* add comments and authors for _PGNMF
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants