[MRG] Fixes 'math domain error' in sklearn.decomposition.PCA with "n_components='mle' #10359

thechargedneutron · 2017-12-22T03:20:21Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This is a PR originally by @lbillingham in #4827. All thanks to him.

Any other comments?

agramfort · 2017-12-24T09:39:56Z

sklearn/decomposition/tests/test_pca.py

+                                        n_redundant=1, n_clusters_per_class=1,
+                                        random_state=42)
+    pca = PCA(n_components='mle').fit(X)
+    assert_equal(pca.n_components_, 0)


isn't it surprising to get 0 components even if you have 1 informative feature?

@agramfort Yes, indeed. There's an off by one problem as found in #4827 (comment) . But I am reluctant to change

scikit-learn/sklearn/decomposition/pca.py

Line 105 in 5311c81

return ll.argmax()

this line to ll.agrmax() + 1 as it may break existing code. Any workaround for this?

…into pca_svd_solver

thechargedneutron · 2017-12-26T18:29:34Z

@agramfort Does this look correct now?

agramfort · 2017-12-27T16:31:00Z

sklearn/decomposition/pca.py

@@ -448,7 +455,8 @@ def _fit_full(self, X, n_components):
        # Postprocess the number of components required
        if n_components == 'mle':
            n_components = \
-                _infer_dimension_(explained_variance_, n_samples, n_features)
+                _infer_dimension_(explained_variance_, n_samples,
+                                  n_features) + 1


you cannot change the behavior in any case like this. Was is it a bug? when I complained before about n_components=0 I was surprised but was the behavior expected for MLE option?

I am not sure. But this off by one problem is discussed here. #4827 (comment)

@vene are you able to comment here? I assume we should not be making this change together with the present fix, but I've not looked into it at all.

I don't think there was an off-by-one error here formerly, the argmax seems to correspond correctly to the assessed rank. But if rank=0 is a bad answer, then we don't need to assess rank=0... If so, that's a separate bug, but this +1 should be removed, IMO.

agramfort · 2017-12-28T14:51:32Z

if you look here: http://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_vs_fa_model_selection.html you see that MLE gives 10 components as expected.

jnothman · 2018-01-23T07:13:50Z

sklearn/decomposition/pca.py

@@ -448,7 +455,8 @@ def _fit_full(self, X, n_components):
        # Postprocess the number of components required
        if n_components == 'mle':
            n_components = \
-                _infer_dimension_(explained_variance_, n_samples, n_features)
+                _infer_dimension_(explained_variance_, n_samples,
+                                  n_features) + 1


@vene are you able to comment here? I assume we should not be making this change together with the present fix, but I've not looked into it at all.

jnothman · 2018-01-30T02:11:37Z

sklearn/decomposition/pca.py

@@ -47,6 +47,9 @@ def _assess_dimension_(spectrum, rank, n_samples, n_features):
        Number of samples.
    n_features : int
        Number of features.
+    rcond : float


why the name rcond?

jnothman · 2018-01-30T02:12:19Z

sklearn/decomposition/pca.py

@@ -47,6 +47,9 @@ def _assess_dimension_(spectrum, rank, n_samples, n_features):
        Number of samples.
    n_features : int
        Number of features.
+    rcond : float
+        Cut-off for values in `spectrum`. Any value lower than this
+        will be ignored (`default=1e-15`)


should the default be a function of dtype?

thechargedneutron added 2 commits December 22, 2017 00:05

conditions added and checked

10e8c3b

tests added

727e21f

thechargedneutron changed the title ~~[WIP] Fixes 'math domain error' in sklearn.decomposition.PCA with "n_components='mle'~~ [MRG] Fixes 'math domain error' in sklearn.decomposition.PCA with "n_components='mle' Dec 22, 2017

agramfort reviewed Dec 24, 2017

View reviewed changes

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

5e94ed3

…into pca_svd_solver

changes added

9cb7490

agramfort reviewed Dec 27, 2017

View reviewed changes

jnothman reviewed Jan 30, 2018

View reviewed changes

jnothman mentioned this pull request Apr 15, 2019

[WIP] PCA n_components='mle' instability. Issue 4441 #4827

Closed

jnothman mentioned this pull request May 23, 2019

Problems in sklearn.decomposition.PCA with "n_components='mle' option" #4441

Closed

amueller added the Needs work label Aug 6, 2019

jnothman added help wanted Stalled labels Oct 27, 2019

lschwetlick mentioned this pull request Jan 25, 2020

[MRG+1] Adress decomposition.PCA mle option problem #16224

Merged

lschwetlick mentioned this pull request Feb 25, 2020

Off-By-One Error in _pca with 'mle' #16546

Closed

github-actions bot added the module:decomposition label Mar 2, 2020

rth closed this in #16224 Mar 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] Fixes 'math domain error' in sklearn.decomposition.PCA with "n_components='mle' #10359

[MRG] Fixes 'math domain error' in sklearn.decomposition.PCA with "n_components='mle' #10359

Uh oh!

thechargedneutron commented Dec 22, 2017 •

edited by jnothman

Loading

Uh oh!

agramfort Dec 24, 2017

Uh oh!

thechargedneutron Dec 24, 2017 •

edited

Loading

Uh oh!

thechargedneutron commented Dec 26, 2017

Uh oh!

agramfort Dec 27, 2017

Uh oh!

thechargedneutron Dec 27, 2017

Uh oh!

jnothman Jan 23, 2018

Uh oh!

jnothman Apr 11, 2019

Uh oh!

agramfort commented Dec 28, 2017 via email

Uh oh!

jnothman Jan 23, 2018

Uh oh!

jnothman Jan 30, 2018

Uh oh!

jnothman Jan 30, 2018

Uh oh!

Uh oh!

Uh oh!

[MRG] Fixes 'math domain error' in sklearn.decomposition.PCA with "n_components='mle' #10359

[MRG] Fixes 'math domain error' in sklearn.decomposition.PCA with "n_components='mle' #10359

Uh oh!

Conversation

thechargedneutron commented Dec 22, 2017 • edited by jnothman Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thechargedneutron Dec 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thechargedneutron commented Dec 26, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agramfort commented Dec 28, 2017 via email

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thechargedneutron commented Dec 22, 2017 •

edited by jnothman

Loading

thechargedneutron Dec 24, 2017 •

edited

Loading