Skip to content

[MRG] KernelPCA: fix transform issue when zero eigenvalues are present and not removed (issue 12141) #12143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Mar 21, 2019

Conversation

smarie
Copy link
Contributor

@smarie smarie commented Sep 24, 2018

This fixes #12141

 - Added a few comments to clarify `_fit_transform`, `fit_transform` and `transform`.
 - Uniformized the numpy coding style for `fit` and `fit_transform`.
@smarie
Copy link
Contributor Author

smarie commented Sep 24, 2018

All set, waiting for review.

@smarie smarie changed the title KernelPCA: fix transform issue when zero eigenvalues are present and not removed (issue 12141) [MRG] KernelPCA: fix transform issue when zero eigenvalues are present and not removed (issue 12141) Sep 25, 2018
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, this looks like a legit bug.

Here are a few first comments, but we would need some core contributors to jump in. @amueller @jnothman maybe?

smarie added 2 commits October 9, 2018 17:52
…s where there is no zero division, to avoid numpy warnings.
…ture to perform fine-grain warning assertion
@NicolasHug
Copy link
Member

Thanks @smarie , LGTM. Let's wait for other reviews now.

@smarie
Copy link
Contributor Author

smarie commented Oct 12, 2018

Thanks @NicolasHug . If you have still a couple time for reviews in the next days, I would really appreciate a review for the companion PRs of this PR:

@NicolasHug
Copy link
Member

Your #12069 (comment)

the randomized method seems to naturally find the "perfect zero eigenvalues" while arpack and dense methods most of the time find a tiny non-zero number instead (e.g. of the order of magnitude of 1e-16 if my memory is correct)

makes me think we should not test for strict equality with zero but instead have an approximate comparison.

@smarie
Copy link
Contributor Author

smarie commented Oct 16, 2018

Actually that's already taken care of - but in #12145 :) where we round to zero the small eigenvalues that are due to bad numerical conditioning (as well as negative eigenvalues because a kernel is supposed to be semi-positive definite so they are probably numerical errors).

See https://github.com/scikit-learn/scikit-learn/pull/12145/files#diff-3b70045c110a30d29de66ed0ea3fb86dR1082 for details

@NicolasHug
Copy link
Member

OK. I guess this would make sense to do it here?

IMHO the way to go would be to first merge this one, then #12145, and then #12069. This way you can merge master each time to avoid the redundant changes in the diffs (right now #12145 diffs also show the changes from here and that make reviewing trickier)

@smarie
Copy link
Contributor Author

smarie commented Oct 17, 2018

Yes, this is the merge order that should be used since each PR contains and relies on the commits from the previous. I am not sure that it would make sense to backport the zero eigenvalues rounding here: indeed, that is precisely what #12145 is about : controling and harmonizing the eigenvalues found by the various solvers.

@NicolasHug
Copy link
Member

You're right, if #12145 sets small eigenvalues to 0 then the code here won't have to change.

@smarie
Copy link
Contributor Author

smarie commented Nov 15, 2018

Hi @NicolasHug , @jnothman , @ogrisel : any update on this PR and the subsequent ones #12145 and #12069 ? Experience shows that the more we wait the more likely the PRs will not be mergeable again...

Of course I understand that this is a volunteer-based open-source project with many contributions, but I really believe that the effort we collectively made on these three PR both in terms of idea (@grilling original code in matlab), coding, and code review, is mature now.

@smarie
Copy link
Contributor Author

smarie commented Mar 2, 2019

@jnothman I updated what's new following your advice on the other kPCA PR. The documentation does not need to be updated, this is ready to Merge (this PR is the first of a series of 3 : #12143, #12145 and #12069)

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicks from me, this still LGTM.

I find the comments that you added very helpful.

Hope we can get other reviewers on this!

@NicolasHug
Copy link
Member

Maybe @adrinjalali and @qinhanmin2014 could to review this? Should be quick to review IMO (LGTM already) and it'd be nice to address @smarie great efforts ;)

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nits about the comments. Otherwise LGTM

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @smarie

@jnothman jnothman added this to the 0.21 milestone Mar 20, 2019
@NicolasHug
Copy link
Member

This is +2 now, @thomasjpfan can you merge it please?

@thomasjpfan thomasjpfan merged commit 2f5bb34 into scikit-learn:master Mar 21, 2019
@smarie
Copy link
Contributor Author

smarie commented Mar 22, 2019

Thanks everyone!

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KernelPCA: fit_transform and transform methods are inconsistent in case of zero eigenvalues
4 participants