Skip to content

[MRG before #12069] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues #12145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 123 commits into from
Nov 14, 2019

Conversation

smarie
Copy link
Contributor

@smarie smarie commented Sep 24, 2018

This PR fixes #12140 by performing some numerical and conditioning checks on the eigenvalues found from kernel decomposition.

This PR is dependent on #12143, that solves an issue happening in transform when zero eigenvalues are present in the kernel.

 - Added a few comments to clarify `_fit_transform`, `fit_transform` and `transform`.
 - Uniformized the numpy coding style for `fit` and `fit_transform`.
@smarie
Copy link
Contributor Author

smarie commented Sep 24, 2018

Needs review: the only failing check is coverage. To improve it we would need to produce tests where the kernel matrix has artificial defects. Would it be ok ? How to check that warnings are properly raised ?

… It is used in `KernelPCA`. Added corresponding test `test_errors_and_warnings` for KernelPCA.
@smarie
Copy link
Contributor Author

smarie commented Sep 25, 2018

While waiting for the review I proposed a way to fix the coverage and improve maintainability: the method to check the kernel eigenvalues (check_kernel_eigenvalues) is now completely independent, in the validation submodule. It comes with some doctests.

In addition I created a test for kPCA to check that warnings and errors are raised correctly in case of bad conditioning.

…ed ('randomized' is not available in this branch !)
…o always check the inner check_kernel_eigenvalues method before the kPCA fit.
…aced during call to fit() - we now execute the test directly on `_fit_transform` instead, so that the matrix is untouched.
@smarie smarie changed the title KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues [MRG] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues Sep 25, 2018
@smarie
Copy link
Contributor Author

smarie commented Oct 17, 2018

I had some afterthoughts on one of the warnings: indeed it is normal to find quasi-zero eigenvalues when the number of samples is high enough (my intuition would be, in the case of a gaussian kernel, that it is when this number is larger than the underlying distribution's manifold dimensionality, but maybe in this paper there are better explanations).

For this reason I will push a new commit where there is no warning by default about zero eigenvalues.

…all: this is most probably a common case especially when the number of samples gets high. Removing the warning by default.
…more, this is normal behaviour as the number of samples increases.
@smarie
Copy link
Contributor Author

smarie commented Oct 17, 2018

All set ! Ready to merge once #12143 will have been.

@smarie smarie changed the title [MRG] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues [MRG after #12143] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues Oct 17, 2018
@jnothman jnothman changed the title [MRG after #12143] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues [MRG] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues Feb 28, 2019
@NicolasHug
Copy link
Member

@adrinjalali are you ok with:

  • having only one parameter to control the warnings
  • leaving that false by default
  • discussing everything else in other PRs / issues

We are clearly over engineering this PR by trying too hard to future-proof it. Let's keep things simple.
(I would even be OK to remove all the warnings. The ValueErrors are already an improvement).

@smarie
Copy link
Contributor Author

smarie commented Nov 7, 2019

a precision concerning your last sentences @NicolasHug "one parameter to control all the warnings" and "I would be ok to remove all the warnings": please have a look at my previous comments, there is a huge difference between the warning for significant negative values (which could even be transformed into a ValueError) and the warning for small non-zeros.

@adrinjalali
Copy link
Member

I'm happy with a significant negative value to be a ValueError, but the rest should all be controlled by one single parameter. And for the rest, I'm happy with your proposal @NicolasHug

@adrinjalali
Copy link
Member

Also, @smarie I really appreciate your persistence and you following the suggestions so promptly. I'm sorry it's being dragged for longer than we all would prefer :)

@smarie
Copy link
Contributor Author

smarie commented Nov 7, 2019

Perfect. I make the changes right now and push what should finally be the final proposal. Almost there :) !

 * Renamed `small_nonzeros_warning` into `enable_warnings`.
 * now consistent warnings are raised for all three cases (imaginary parts, negative, small non zero), and the parameter disables all of them.
 * improved string formatting using `%g` instead of `%f` or other things
@smarie
Copy link
Contributor Author

smarie commented Nov 7, 2019

Ready for a last round. That "simple" last change was actually quite impactant but I think that the result is now straightforward and consistent. I updated the docstring, please have a look.

In details:

  • Renamed small_nonzeros_warning into enable_warnings.
  • now consistent warnings are raised for all 3 cases of non-significant things set to zero (new imaginary parts, negative values, small non-zero values). Basically everytime the method modifies the eigenvalues, there is a potential warning telling why. The enable_warnings parameter enables/disables all of them at once, as requested.
  • improved string formatting using %g instead of %f or other things
  • updated the tests and got rid of the useless extra test, that is now included in the other

Sylvain MARIE added 4 commits November 7, 2019 22:07
…th the others. Now adopting the same message everywhere.
…y parts are all zeros, to convert to float dtype
…messages explicitly state what is happening (setting to zero the imag/negative/small value).
@smarie
Copy link
Contributor Author

smarie commented Nov 8, 2019

All seems ok now, stopping edits and waiting for final review @adrinjalali @NicolasHug .

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @smarie, only nitpicks from me but LGTM

Co-Authored-By: Nicolas Hug <contact@nicolas-hug.com>
@smarie
Copy link
Contributor Author

smarie commented Nov 8, 2019

Thanks very much for the review. @adrinjalali you're up next (if you have a few minutes to devote to this) :)

@NicolasHug
Copy link
Member

ping @adrinjalali

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot @smarie and @NicolasHug

@NicolasHug NicolasHug merged commit 3432140 into scikit-learn:master Nov 14, 2019
@NicolasHug
Copy link
Member

Merged!
Thank you so much @smarie for your consistent work on this

@jnothman
Copy link
Member

jnothman commented Nov 14, 2019 via email

@smarie
Copy link
Contributor Author

smarie commented Nov 15, 2019

So cool ! I will now be able to finalize #12069
Thanks so much for the hard reviewing task !

adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this pull request Nov 18, 2019
adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this pull request Nov 18, 2019
panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020
ogrisel added a commit that referenced this pull request Apr 27, 2021
…r faster partial decompositions, like in PCA (#12069)

Co-authored-by: Sylvain MARIE <sylvain.marie@se.com>
Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Joel Nothman <joel.nothman@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com>
Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KernalPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues
8 participants