-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Online implementation of Non-negative Matrix Factorizarion (NMF) #13308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Nice figure! Let me see with the others if this should go in. |
Could you run on a wikipedia text example? |
Can you do a draft PR just adding the code to scikit-learn, so that people can have a quick look at how complicated the code would be. Don't worry about test, documentation, or full API compatibility for now. Just add the object (which I think should be called MiniBatchNMF). |
would be nice to see more full benchmarks but I think in general this would be useful. |
would be nice to see more full benchmarks but I think in general this would be useful.
Alright. Let's work on this!
@amueller, any specific benchmark that you have in mind?
|
vary n_features and n_samples? |
@GaelVaroquaux , @amueller I have synchronized with master the code proposed by @pcerda in #13386. I was unable to access the data set used in the previous comment, I used the "Blog Authorship Corpus" then. The code used to produce the plot is a variation on the topic extraction example. It is available here. Comments? |
The speed up is impressive. Thank you for highlighting it. |
@cmarmo : can you push a PR with the new code. So that we can review and iterate on this. Thanks!! |
I'm on it: just trying to get rid of the |
I can open a WIP PR if you think it's worth it,
I think that it's worth it.
but I would rather if you wait before reviewing the code.
It's likely that I won't have time in the short term. I see this PR more
as a communication tool across the team.
|
@cmarmo in the benchmarks you made above, you display the convergence time. What is the convergence criterion and do both versions converge to a solution of similar quality ? |
@jeremiedbb I'm working to add this information to the plot. Thanks for your comment. |
Hi @jeremiedbb and thanks for your patience, my plot behaved fuzzy probably because someone else was heavyly computing on the machine at the same time... waiting for the new version let me just say that, if I understand correctly, the convergence is computed here scikit-learn/sklearn/decomposition/_nmf.py Line 860 in 5af23c9
|
@jeremiedbb this new plot and my previous comment, are they answering your question? |
So in all your benchmarks, online NMF converges faster to a better solution. Pretty convincing ! |
As I'm starting to play with the source, for the sake of reproducibility this plot and this one have been produced with this version of the nmf code. |
Hurray!! Congratulations!
|
Description
@GaelVaroquaux Maybe an online version of the NMF could be useful in big data settings.
Steps/Code to Reproduce
Expected Results
Actual Results
Here I show the results for an sparse matrix as input of size 1M x 2.8k
The factorization has
n_components = 10
and each point is an entire pass on the data.The current online implementation only supports the kullback-leibler divergence, but in practice
can be generalized to any beta divergence.
Versions
The text was updated successfully, but these errors were encountered: