[MRG before #12069] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues #12145

smarie · 2018-09-24T14:06:14Z

This PR fixes #12140 by performing some numerical and conditioning checks on the eigenvalues found from kernel decomposition.

This PR is dependent on #12143, that solves an issue happening in transform when zero eigenvalues are present in the kernel.

- Added a few comments to clarify `_fit_transform`, `fit_transform` and `transform`. - Uniformized the numpy coding style for `fit` and `fit_transform`.

…rnel conditioning. Fixes scikit-learn#12140

smarie · 2018-09-24T20:00:43Z

Needs review: the only failing check is coverage. To improve it we would need to produce tests where the kernel matrix has artificial defects. Would it be ok ? How to check that warnings are properly raised ?

… It is used in `KernelPCA`. Added corresponding test `test_errors_and_warnings` for KernelPCA.

smarie · 2018-09-25T09:29:56Z

While waiting for the review I proposed a way to fix the coverage and improve maintainability: the method to check the kernel eigenvalues (check_kernel_eigenvalues) is now completely independent, in the validation submodule. It comes with some doctests.

In addition I created a test for kPCA to check that warnings and errors are raised correctly in case of bad conditioning.

…ange failure in Travis.

…ed ('randomized' is not available in this branch !)

…o always check the inner check_kernel_eigenvalues method before the kPCA fit.

…aced during call to fit() - we now execute the test directly on `_fit_transform` instead, so that the matrix is untouched.

… this time :) ).

…e still errors of this kind.

…s where there is no zero division, to avoid numpy warnings.

…o zero division numpy warning.

…ture to perform fine-grain warning assertion

# Conflicts: # sklearn/decomposition/tests/test_kernel_pca.py

smarie · 2018-10-17T07:39:16Z

I had some afterthoughts on one of the warnings: indeed it is normal to find quasi-zero eigenvalues when the number of samples is high enough (my intuition would be, in the case of a gaussian kernel, that it is when this number is larger than the underlying distribution's manifold dimensionality, but maybe in this paper there are better explanations).

For this reason I will push a new commit where there is no warning by default about zero eigenvalues.

…all: this is most probably a common case especially when the number of samples gets high. Removing the warning by default.

…more, this is normal behaviour as the number of samples increases.

smarie · 2018-10-17T15:05:31Z

All set ! Ready to merge once #12143 will have been.

NicolasHug · 2019-11-07T14:34:38Z

@adrinjalali are you ok with:

having only one parameter to control the warnings
leaving that false by default
discussing everything else in other PRs / issues

We are clearly over engineering this PR by trying too hard to future-proof it. Let's keep things simple.
(I would even be OK to remove all the warnings. The ValueErrors are already an improvement).

…g` is now `False` by default

smarie · 2019-11-07T15:54:27Z

a precision concerning your last sentences @NicolasHug "one parameter to control all the warnings" and "I would be ok to remove all the warnings": please have a look at my previous comments, there is a huge difference between the warning for significant negative values (which could even be transformed into a ValueError) and the warning for small non-zeros.

adrinjalali · 2019-11-07T16:23:48Z

I'm happy with a significant negative value to be a ValueError, but the rest should all be controlled by one single parameter. And for the rest, I'm happy with your proposal @NicolasHug

adrinjalali · 2019-11-07T16:24:45Z

Also, @smarie I really appreciate your persistence and you following the suggestions so promptly. I'm sorry it's being dragged for longer than we all would prefer :)

smarie · 2019-11-07T17:02:46Z

Perfect. I make the changes right now and push what should finally be the final proposal. Almost there :) !

* Renamed `small_nonzeros_warning` into `enable_warnings`. * now consistent warnings are raised for all three cases (imaginary parts, negative, small non zero), and the parameter disables all of them. * improved string formatting using `%g` instead of `%f` or other things

smarie · 2019-11-07T18:19:45Z

Ready for a last round. That "simple" last change was actually quite impactant but I think that the result is now straightforward and consistent. I updated the docstring, please have a look.

In details:

Renamed small_nonzeros_warning into enable_warnings.
now consistent warnings are raised for all 3 cases of non-significant things set to zero (new imaginary parts, negative values, small non-zero values). Basically everytime the method modifies the eigenvalues, there is a potential warning telling why. The enable_warnings parameter enables/disables all of them at once, as requested.
improved string formatting using %g instead of %f or other things
updated the tests and got rid of the useless extra test, that is now included in the other

…th the others. Now adopting the same message everywhere.

…ot copied back. Added it.

…y parts are all zeros, to convert to float dtype

…messages explicitly state what is happening (setting to zero the imag/negative/small value).

smarie · 2019-11-08T08:40:38Z

All seems ok now, stopping edits and waiting for final review @adrinjalali @NicolasHug .

NicolasHug

Thanks @smarie, only nitpicks from me but LGTM

sklearn/utils/tests/test_validation.py

Co-Authored-By: Nicolas Hug <contact@nicolas-hug.com>

smarie · 2019-11-08T14:56:28Z

Thanks very much for the review. @adrinjalali you're up next (if you have a few minutes to devote to this) :)

NicolasHug · 2019-11-12T13:38:09Z

ping @adrinjalali

adrinjalali

LGTM, thanks a lot @smarie and @NicolasHug

sklearn/decomposition/tests/test_kernel_pca.py

NicolasHug · 2019-11-14T17:29:56Z

Merged!
Thank you so much @smarie for your consistent work on this

jnothman · 2019-11-14T21:28:55Z

Yay!!

smarie · 2019-11-15T10:36:58Z

So cool ! I will now be able to finalize #12069
Thanks so much for the hard reviewing task !

…cikit-learn#12145)

…12145)

…cikit-learn#12145)

…r faster partial decompositions, like in PCA (#12069) Co-authored-by: Sylvain MARIE <sylvain.marie@se.com> Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Co-authored-by: Joel Nothman <joel.nothman@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com> Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>

smarie added 3 commits September 24, 2018 13:49

Minor edits for clarity:

d84f675

- Added a few comments to clarify `_fit_transform`, `fit_transform` and `transform`. - Uniformized the numpy coding style for `fit` and `fit_transform`.

Fixed scikit-learn#12141.

27b4de5

Added exceptions and warnings in case of numerical issues and bad ke…

cb7cd15

…rnel conditioning. Fixes scikit-learn#12140

smarie mentioned this pull request Sep 24, 2018

[MRG after #12145] Add "Randomized SVD" solver option to KernelPCA for faster partial decompositions, like in PCA #12069

Merged

Added a check_kernel_eigenvalues validation method (with doctests).…

0c4ebaa

… It is used in `KernelPCA`. Added corresponding test `test_errors_and_warnings` for KernelPCA.

smarie added 7 commits September 25, 2018 13:13

Added an additional check in the test, in order to understand the str…

5285a4e

…ange failure in Travis.

Fixed test_errors_and_warnings: a wrong solver name had been introduc…

66ce769

…ed ('randomized' is not available in this branch !)

kPCA test_errors_and_warnings: Added a few extra checks in the test t…

98ae1ac

…o always check the inner check_kernel_eigenvalues method before the kPCA fit.

Fixed test test_errors_and_warnings. Our identity centerer was repl…

12d1a6f

…aced during call to fit() - we now execute the test directly on `_fit_transform` instead, so that the matrix is untouched.

Fixed doctest in check_kernel_eigenvalues

381fbb0

Final doctest fixes for check_kernel_eigenvalues (locally validated…

7d98502

… this time :) ).

pytest doc fix: added normalize whitespace everywhere since there wer…

ea92a52

…e still errors of this kind.

smarie changed the title ~~KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues~~ [MRG] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues Sep 25, 2018

smarie mentioned this pull request Sep 26, 2018

Add "Randomized SVD" solver option to KernelPCA for faster partial decompositions, like in PCA #12068

Closed

smarie added 3 commits October 8, 2018 10:52

Added test corresponding to issue scikit-learn#12141

9d6ee75

Following code review from @NicolasHug : now only scaling eigenvector…

3206a43

…s where there is no zero division, to avoid numpy warnings.

Following code review from @NicolasHug : now checking that there is n…

9fbbbb6

…o zero division numpy warning.

smarie mentioned this pull request Oct 9, 2018

[MRG] KernelPCA: fix transform issue when zero eigenvalues are present and not removed (issue 12141) #12143

Merged

smarie added 4 commits October 11, 2018 10:32

Following code review from @NicolasHug : now using pytest warning cap…

b09a1e7

…ture to perform fine-grain warning assertion

Merge branch 'master' into kPCA_fix_issue_12141

a3900f4

Merge was wrong - removing useless import

7fe9f06

Merge branch 'kPCA_fix_issue_12141' into kPCA_fix_issue_12140

72d88ef

# Conflicts: # sklearn/decomposition/tests/test_kernel_pca.py

smarie added 2 commits October 17, 2018 09:56

Afterthoughts on the need to warn when gram matrix eigenvalues are sm…

6124783

…all: this is most probably a common case especially when the number of samples gets high. Removing the warning by default.

Fixed test: no warning is raised in presence of small eigenvalues any…

2112f5d

…more, this is normal behaviour as the number of samples increases.

smarie changed the title ~~[MRG] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues~~ [MRG after #12143] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues Oct 17, 2018

jnothman changed the title ~~[MRG after #12143] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues~~ [MRG] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues Feb 28, 2019

_check_psd_eigenvalues : as per code review, `small_nonzeros_warnin…

71ed9fd

…g` is now `False` by default

Sylvain MARIE added 4 commits November 7, 2019 22:07

Fixed doctest + One of the error messages was still not consistent wi…

7fdbbd5

…th the others. Now adopting the same message everywhere.

One test case (small pos eigenvals) had been mistakenly removed and n…

d06e71a

…ot copied back. Added it.

Reverted bad line: np.real(lambdas) should also be done if imaginar…

b58f1a0

…y parts are all zeros, to convert to float dtype

Two very minor improvements of the warning messages: now all warning …

581aca0

…messages explicitly state what is happening (setting to zero the imag/negative/small value).

NicolasHug approved these changes Nov 8, 2019

View reviewed changes

sklearn/utils/tests/test_validation.py Outdated Show resolved Hide resolved

sklearn/utils/tests/test_validation.py Show resolved Hide resolved

Update sklearn/utils/tests/test_validation.py

4d05d0a

Co-Authored-By: Nicolas Hug <contact@nicolas-hug.com>

adrinjalali approved these changes Nov 12, 2019

View reviewed changes

sklearn/decomposition/tests/test_kernel_pca.py Outdated Show resolved Hide resolved

smarie and others added 2 commits November 14, 2019 17:13

Update sklearn/decomposition/tests/test_kernel_pca.py

a1c64c4

Fixed docstring

b49f6ab

NicolasHug merged commit 3432140 into scikit-learn:master Nov 14, 2019

adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this pull request Nov 18, 2019

ENH KernelPCA raises error in case of numerical/conditioning issues (s…

3650cf2

…cikit-learn#12145)

adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this pull request Nov 18, 2019

ENH KernelPCA raises error in case of numerical/conditioning issues (s…

895cae9

…cikit-learn#12145)

adrinjalali pushed a commit that referenced this pull request Nov 19, 2019

ENH KernelPCA raises error in case of numerical/conditioning issues (#…

a62ba93

…12145)

panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020

ENH KernelPCA raises error in case of numerical/conditioning issues (s…

19d1b24

…cikit-learn#12145)

NicolasHug mentioned this pull request Apr 1, 2020

[MRG+1] Fix for 8368 (Addresses nan output from fit_transform() in kernelPCA) #8531

Closed

Uh oh!

[MRG before #12069] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues #12145

[MRG before #12069] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues #12145

Uh oh!

Conversation

smarie commented Sep 24, 2018

Uh oh!

smarie commented Sep 24, 2018

Uh oh!

smarie commented Sep 25, 2018

Uh oh!

smarie commented Oct 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smarie commented Oct 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug commented Nov 7, 2019

Uh oh!

smarie commented Nov 7, 2019

Uh oh!

adrinjalali commented Nov 7, 2019

Uh oh!

adrinjalali commented Nov 7, 2019

Uh oh!

smarie commented Nov 7, 2019

Uh oh!

smarie commented Nov 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smarie commented Nov 8, 2019

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

smarie commented Nov 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug commented Nov 12, 2019

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NicolasHug commented Nov 14, 2019

Uh oh!

jnothman commented Nov 14, 2019 via email

Uh oh!

smarie commented Nov 15, 2019

Uh oh!

Uh oh!

smarie commented Oct 17, 2018 •

edited

Loading

smarie commented Oct 17, 2018 •

edited

Loading

smarie commented Nov 7, 2019 •

edited

Loading

smarie commented Nov 8, 2019 •

edited

Loading