[MRG] Add scaling to alpha regularization parameter in NMF #5296

TomDLT · 2015-09-22T17:06:03Z

(I separated this modification from #4852 (comment) for further discussion about it.)
In NMF, I scaled here the regularization parameter alpha with n_samples and n_features.

Indeed, without scaling, two problems appear:

The regularization penalizes the sum of coefficients in W and H. If the number of element in W and H is not the same (i.e. n_samples != n_features), the constraint is unbalanced. One of H or W goes to zero, and the other one, which is less penalized, increases to compensate.
The value of alpha that makes the coefficients of W and H collapse is proportional with sqrt(n_features * n_samples). It makes the scaling of alpha depends on the size of the data.

Test to prove the point: I tested several sizes for the input X, and plotted how the coefficients in W and H collapse with respect to the alpha parameter.

without scaling alpha:

with properly scaling alpha:

Scaling used: I used alpha_W = alpha * n_features and alpha_H = alpha * n_samples.

Conclusion: the effect of the alpha parameter is much more consistent if we scale it.

As L1 and L2 regularizations in NMF is fresh new (#4852), it would not really break any code (before 0.17 at least).
But do we want to add this?
Is it consistent with other estimators in scikit-learn?
What Do You Think?

@vene @mblondel

TomDLT · 2015-09-22T17:08:04Z

sklearn/decomposition/nmf.py

+    alpha_W = 0.
+    # The priors for W and H are scaled differently.
+    if regularization in ('both', 'components'):
+        alpha_H = float(alpha) * n_samples


Here is the proper scaling of alpha.
All other modifications in this PR are just a clean way to pass the same regularization to every solvers.

ogrisel · 2016-10-05T13:24:41Z

What is plotted, is it a norm of the matrices? If so which?

ogrisel · 2016-10-05T13:27:56Z

While I agree that with your change to add data dependent scaling the transfer effect seems to be smaller than without the scaling, I am worried that this kind of scaling could be non-standard in the literature. Could you please try to review popular implementation of NMF to check if they do this too?

Have you seen this kind of scaling in the literature?

TomDLT · 2016-10-05T13:44:48Z

The plots show the mean of all elements in W and H, i.e. the L1 norm normalized by the number of elements.

It all came from @vene's comment (#4852 (comment)), and to avoid blocking the PR, I opened a separate PR to further talk about it.
I have never seen any such scaling in the literature, yet as @vene's puts it (#4852 (comment)), "it's a usability detail, [...] But usability is important". I guess this is the difference between a paper's implementation and a general public package's implementation. At the same time, I understand the need to avoid non-standard behaviors.

tguillemot · 2016-12-16T13:55:44Z

As #4852 is merged now, what's the status of this PR ?

TomDLT · 2016-12-16T14:35:42Z

Status: Controversial
We are not set yet on what we prefer:

standard (current) behavior: no scaling of alpha, but the regularization depends on the size of the data, and the regularization is not balanced between W and H.
non-standard (proposed) behavior: scaling alpha, so the regularization does not depend on the size of the data, and is balanced between W and H.

New idea: another slightly more standard method could be to normalize H (aka the dictionary) to unit norm at each iteration, yet I am not sure of the consequences on regularization.

ogrisel · 2020-10-14T15:24:39Z

Coming back to this PR after all those years. It's true that the proposed normalization for the regularizer makes a lot of sense from the look of the experiment. Could you try to do the same experiments on some natural datases (e.g. TF-IDF vectors or scientific signal data) to see if the normal of the matrices shrink fast close to alpha=1 as is the case on your synthetic experiments?

However we would need a new option FutureWarning to select between the 2 regularizer defintiions (in the public API only) along with a FutureWarning that explains that the default behavior will change.

vene · 2020-10-20T07:39:32Z

I agree this looks very convincing especially if as Olivier asks we see the pattern in real data too. This seems more than a usability issue since we don't provide the API to search for separate alphas for W and H, which would be needed to construct an equivalent problem, right?

We could have a scale_alpha=False|True arg and change its default value.

TomDLT · 2020-10-20T18:53:29Z

So I tried with two datasets:

Olivetti faces, (400, 4096), dense. I get the same results than with randomly generated data (not shown here).
20 newsgroup, (11314, 39115), sparse. I get pretty different results (see below). The new scaling definitely improves the situation, but not quite as neatly as in the dense case. We probably need to add some heuristic that takes into account the sparsity level.

without scaling (master branch)

with scaling (this branch)

import itertools
import warnings

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import NMF
from sklearn.exceptions import ConvergenceWarning

warnings.simplefilter('ignore', DeprecationWarning)
warnings.simplefilter('ignore', ConvergenceWarning)


def load_20news():
    from sklearn.datasets import fetch_20newsgroups
    from sklearn.feature_extraction.text import TfidfVectorizer
    dataset = fetch_20newsgroups(shuffle=True, random_state=1,
                                 remove=('headers', 'footers', 'quotes'))
    vectorizer = TfidfVectorizer(max_df=0.95, min_df=2, stop_words='english')
    X = vectorizer.fit_transform(dataset.data)
    return X


def load_faces():
    from sklearn.datasets import fetch_olivetti_faces
    X = fetch_olivetti_faces(shuffle=True).data
    return X


def load_random(n_samples, n_features):
    rng = np.random.RandomState(0)
    X = np.abs(rng.randn(n_samples, n_features))
    return X


dataset = "random"
if dataset == "20news":
    X = load_20news()
    n_samples_list = [100, 1000, 10000]
    n_features_list = [100, 1000, 10000]
elif dataset == "faces":
    X = load_faces()
    n_samples_list = [4, 40, 400]
    n_features_list = [4, 40, 400]
elif dataset == "random":
    X = load_random(1000, 1000)
    n_samples_list = [10, 100, 1000]
    n_features_list = [10, 100, 1000]

alphas = np.logspace(-7, 3, num=100)

estimator = NMF(n_components=10, solver='cd', init='random',
                tol=1e-4, max_iter=10, random_state=0, alpha=0., l1_ratio=0.,
                verbose=0, shuffle=True)

fig, axes = plt.subplots(3, 3, figsize=(20, 15))
for (n_samples,
     n_features), ax in zip(itertools.product(n_samples_list, n_features_list),
                            axes.ravel()):
    print('%d - %d' % (n_samples, n_features))

    mean_w = np.zeros(len(alphas))
    mean_h = np.zeros(len(alphas))
    for ii, alpha in enumerate(alphas):
        # alpha *= np.sqrt(n_samples * n_features)
        estimator = estimator.set_params(alpha=alpha)
        W = estimator.fit_transform(X[:n_samples, :n_features])
        H = estimator.components_
        mean_w[ii] = W.mean()
        mean_h[ii] = H.mean()

    ax.set_xscale('log')
    ax.plot(alphas, mean_w, label='W')
    ax.plot(alphas, mean_h, label='H')
    ax.legend(loc=0)
    ax.set_title('X.shape = (%d, %d)' % (n_samples, n_features))

fig.suptitle('Mean coefficient, w.r.t alpha L2 regularisation', size=16)
fig.savefig(f'{dataset}_master.png')
plt.show()

TomDLT · 2021-07-15T18:04:00Z

Closed by #20512

TomDLT reviewed Sep 22, 2015
View reviewed changes

TomDLT changed the title ~~[WIP] Add scaling to alpha regularization parameter in NMF~~ [MRG] Add scaling to alpha regularization parameter in NMF Sep 28, 2015

TomDLT force-pushed the nmf_alpha_scale branch from c832609 to 8060b44 Compare November 10, 2015 12:18

amueller added the Waiting for Reviewer label Dec 10, 2015

ENH add scaling to alpha regularization parameter

edcf422

TomDLT force-pushed the nmf_alpha_scale branch from 8060b44 to edcf422 Compare April 26, 2016 16:58

TomDLT mentioned this pull request Oct 3, 2016

[MRG+1] Add multiplicative-update solver in NMF, with all beta-divergence #5295

Merged

github-actions bot added the module:decomposition label Mar 2, 2020

TomDLT mentioned this pull request Oct 9, 2020

ENH Warn future change of default init in NMF #18525

Merged

Merge branch 'master' into nmf_alpha_scale

a037c60

Base automatically changed from master to main January 22, 2021 10:48

TomDLT mentioned this pull request Jul 8, 2021

NMF regularization should not depend on n_samples #20484

Closed

jeremiedbb mentioned this pull request Jul 12, 2021

Rescale regularization terms of NMF #20512

Merged

cmarmo removed the Waiting for Reviewer label Jul 12, 2021

TomDLT closed this Jul 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] Add scaling to alpha regularization parameter in NMF #5296

[MRG] Add scaling to alpha regularization parameter in NMF #5296

Uh oh!

TomDLT commented Sep 22, 2015 •

edited

Loading

Uh oh!

TomDLT Sep 22, 2015

Uh oh!

ogrisel commented Oct 5, 2016

Uh oh!

ogrisel commented Oct 5, 2016

Uh oh!

TomDLT commented Oct 5, 2016

Uh oh!

tguillemot commented Dec 16, 2016

Uh oh!

TomDLT commented Dec 16, 2016 •

edited

Loading

Uh oh!

ogrisel commented Oct 14, 2020

Uh oh!

vene commented Oct 20, 2020

Uh oh!

TomDLT commented Oct 20, 2020 •

edited

Loading

Uh oh!

TomDLT commented Jul 15, 2021

Uh oh!

Uh oh!

Uh oh!

[MRG] Add scaling to alpha regularization parameter in NMF #5296

[MRG] Add scaling to alpha regularization parameter in NMF #5296

Uh oh!

Conversation

TomDLT commented Sep 22, 2015 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomDLT Sep 22, 2015

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Oct 5, 2016

Uh oh!

ogrisel commented Oct 5, 2016

Uh oh!

TomDLT commented Oct 5, 2016

Uh oh!

tguillemot commented Dec 16, 2016

Uh oh!

TomDLT commented Dec 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Oct 14, 2020

Uh oh!

vene commented Oct 20, 2020

Uh oh!

TomDLT commented Oct 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomDLT commented Jul 15, 2021

Uh oh!

Uh oh!

TomDLT commented Sep 22, 2015 •

edited

Loading

TomDLT commented Dec 16, 2016 •

edited

Loading

TomDLT commented Oct 20, 2020 •

edited

Loading