-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Add multiplicative-update solver in NMF, with all beta-divergence #5295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
6e071f2
ENH add multiplicative-update solver in NMF, with all beta-divergence
TomDLT 7adab35
improve docstring
TomDLT 874944d
DOC Add links to references
TomDLT f167ebf
FIX link to reference
TomDLT e7ca049
add warning when solver='mu' and init='nndsvd'
TomDLT 4b5ac3f
revert all change in benchmark (separate PR)
TomDLT fa246d1
add example in the doc and docstring
TomDLT cbd78a5
change versionadded to 0.19
TomDLT 4277ae6
fix doctest
TomDLT b7a63d4
address review's comments
TomDLT 4ad79d7
Temporary: test a stopping criterion in nmf-MU
TomDLT bd6474a
update convergence criterion and tests to avoid warnings
TomDLT 9923554
normalize convergence criterion with error_at_init
TomDLT 44bffa7
Fix test adding a copy of shared inititalization
TomDLT 71d2d12
add NMF with KL divergence in topic extraction example
TomDLT e572448
Fix add init parameter for custom init
TomDLT a732dae
decrease to 10 iteration between convergence test
TomDLT 4f20b12
Fix the reconstruction error from x**2 / 2 to x
TomDLT 3b18a45
fix init docstring
TomDLT 6dff7c9
typo and improve test decreasing
TomDLT 6b20e30
remove unused private function _safe_compute_error
TomDLT 0ee3cbf
make beta_divergence function private
TomDLT f428d20
Remove deprecated ProjectedGradientNMF
TomDLT dd4d6b5
remove warning in test
TomDLT 31f2c0c
update doc
TomDLT 44779a1
FIX raise an error when beta_loss <= 0 and X contains zeros
TomDLT 4713b1c
TYPO epsilson -> epsilon
TomDLT 0d9fb50
remove other occurences of ProjectedGradientNMF
TomDLT 42de807
add whats_new.rst entry
TomDLT 2af4a23
minor leftovers
TomDLT 259d827
non-ascii and nitpick
TomDLT 057c70c
safe_min instead of min
TomDLT a9ac84a
solve conflict with master
TomDLT 5e017c6
Merge branch 'master' into nmf_mu
TomDLT 928ea89
minor doc update
TomDLT File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,10 @@ | |
The output is a list of topics, each represented as a list of terms | ||
(weights are not shown). | ||
|
||
Non-negative Matrix Factorization is applied with two different objective | ||
functions: the Frobenius norm, and the generalized Kullback-Leibler divergence. | ||
The latter is equivalent to Probabilistic Latent Semantic Indexing. | ||
|
||
The default parameters (n_samples / n_features / n_topics) should make | ||
the example runnable in a couple of tens of seconds. You can try to | ||
increase the dimensions of the problem, but be aware that the time | ||
|
@@ -36,9 +40,10 @@ | |
|
||
def print_top_words(model, feature_names, n_top_words): | ||
for topic_idx, topic in enumerate(model.components_): | ||
print("Topic #%d:" % topic_idx) | ||
print(" ".join([feature_names[i] | ||
for i in topic.argsort()[:-n_top_words - 1:-1]])) | ||
message = "Topic #%d: " % topic_idx | ||
message += " ".join([feature_names[i] | ||
for i in topic.argsort()[:-n_top_words - 1:-1]]) | ||
print(message) | ||
print() | ||
|
||
|
||
|
@@ -71,17 +76,31 @@ def print_top_words(model, feature_names, n_top_words): | |
t0 = time() | ||
tf = tf_vectorizer.fit_transform(data_samples) | ||
print("done in %0.3fs." % (time() - t0)) | ||
print() | ||
|
||
# Fit the NMF model | ||
print("Fitting the NMF model with tf-idf features, " | ||
print("Fitting the NMF model (Frobenius norm) with tf-idf features, " | ||
"n_samples=%d and n_features=%d..." | ||
% (n_samples, n_features)) | ||
t0 = time() | ||
nmf = NMF(n_components=n_topics, random_state=1, | ||
alpha=.1, l1_ratio=.5).fit(tfidf) | ||
print("done in %0.3fs." % (time() - t0)) | ||
|
||
print("\nTopics in NMF model:") | ||
print("\nTopics in NMF model (Frobenius norm):") | ||
tfidf_feature_names = tfidf_vectorizer.get_feature_names() | ||
print_top_words(nmf, tfidf_feature_names, n_top_words) | ||
|
||
# Fit the NMF model | ||
print("Fitting the NMF model (generalized Kullback-Leibler divergence) with " | ||
"tf-idf features, n_samples=%d and n_features=%d..." | ||
% (n_samples, n_features)) | ||
t0 = time() | ||
nmf = NMF(n_components=n_topics, random_state=1, beta_loss='kullback-leibler', | ||
solver='mu', max_iter=1000, alpha=.1, l1_ratio=.5).fit(tfidf) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. typo: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks |
||
print("done in %0.3fs." % (time() - t0)) | ||
|
||
print("\nTopics in NMF model (generalized Kullback-Leibler divergence):") | ||
tfidf_feature_names = tfidf_vectorizer.get_feature_names() | ||
print_top_words(nmf, tfidf_feature_names, n_top_words) | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
""" | ||
============================== | ||
Beta-divergence loss functions | ||
============================== | ||
|
||
A plot that compares the various Beta-divergence loss functions supported by | ||
the Multiplicative-Update ('mu') solver in :class:`sklearn.decomposition.NMF`. | ||
""" | ||
import numpy as np | ||
import matplotlib.pyplot as plt | ||
from sklearn.decomposition.nmf import _beta_divergence | ||
|
||
print(__doc__) | ||
|
||
x = np.linspace(0.001, 4, 1000) | ||
y = np.zeros(x.shape) | ||
|
||
colors = 'mbgyr' | ||
for j, beta in enumerate((0., 0.5, 1., 1.5, 2.)): | ||
for i, xi in enumerate(x): | ||
y[i] = _beta_divergence(1, xi, 1, beta) | ||
name = "beta = %1.1f" % beta | ||
plt.plot(x, y, label=name, color=colors[j]) | ||
|
||
plt.xlabel("x") | ||
plt.title("beta-divergence(1, x)") | ||
plt.legend(loc=0) | ||
plt.axis([0, 4, 0, 3]) | ||
plt.show() |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please insert the figure of the beta divergence loss function example after this formula?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done