-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Improve benchmark on NMF #5779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
m = Memory(cachedir='.', verbose=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m
--> mem
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This need a rebase + see my comments, both other than that +1 for merge.
report(norm(X - np.dot(W, H)), tend) | ||
|
||
return timeset, err | ||
@ignore_warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment to explain which kind of warnings do you expect to ignore here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a panda's indexing warning, I solved it and removed the ignore_warnings
.
|
||
|
||
# use joblib to cache results. | ||
# X_shape is specified in arguments for avoiding hashing X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, how much overhead does the hashing of X adds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few seconds on each dataset, i.e. for ~200 calls to bench_one
.
If I recall correctly, I added it mostly for larger datasets like MNIST or RCV1.
57cdf03
to
38561e1
Compare
Rebased and comments addressed. |
Question: should we keep the projected-gradient solver in this benchmark, once it is removed from the package (in 0.19) ? |
Depends on how much trouble it is to keep them around, I'd say. It's informative to have it in the benchmark. |
lgtm |
Well, I think you should drop ProjectedGradient now that we're in 0.19; and ideally we'd get #5295 merged, but otherwise this does have +2 already. |
@TomDLT Can you rebase and drop ProjectedGradient ? |
Soon 😸 |
Update:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't tell you about _PGNMF
.
LGTM anyway
|
||
################### | ||
# Start of _PGNMF # | ||
################### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you keep it maybe you can add a comment to say where comes from that codes and when it was removed of sklearn ?
Keep it there I think.
Sent from phone. Please excuse spelling and brevity.
…On Dec 19, 2016 5:34 AM, "Thierry Guillemot" ***@***.***> wrote:
***@***.**** approved this pull request.
I can't tell you about _PGNMF.
LGTM anyway
------------------------------
In benchmarks/bench_plot_nmf.py
<#5779 (review)>
:
>
+###################
+# Start of _PGNMF #
+###################
If you keep it maybe you can add a comment to say where comes from that
codes et when it was removed of sklearn ?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5779 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAbcFi_kvF7jg94u16pUUu2EaN1DrIEFks5rJl2ggaJpZM4GfTpQ>
.
|
Thanks for the clarity @TomDLT ! |
* ENH improve benchmark on nmf * add projected gradient solver inside the benchmark file * add comments and authors for _PGNMF
* ENH improve benchmark on nmf * add projected gradient solver inside the benchmark file * add comments and authors for _PGNMF
* ENH improve benchmark on nmf * add projected gradient solver inside the benchmark file * add comments and authors for _PGNMF
* ENH improve benchmark on nmf * add projected gradient solver inside the benchmark file * add comments and authors for _PGNMF
* ENH improve benchmark on nmf * add projected gradient solver inside the benchmark file * add comments and authors for _PGNMF
Previous benchmark used simulated data, did not use the new coordinate descent solver (#4852), and I found the plot very uninformative:


This new benchmark tests NMF on two datasets:
It uses three different solvers:
and three different initialization schemes:
The total running time is about 2 minutes for 20 Newsgroups dataset, and 1 minute for Olivetti faces dataset.
On the plots, each point corresponds to one more iteration.


This change is