Skip to content

[MRG+1] Fix for 8368 (Addresses nan output from fit_transform() in kernelPCA) #8531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

AlaaMoussawi
Copy link

@AlaaMoussawi AlaaMoussawi commented Mar 4, 2017

Fixes #8368
Addresses nan output from fit_transform() function of sklearn/decomposition/kernel_pca.py
Handles nan output when taking square root of zero with nan_to_num function. Suppresses warnings.

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

@AlaaMoussawi AlaaMoussawi changed the title Fix for 8368 Fix for 8368 (Addresses nan output from fit_transform() in kernelPCA) Mar 4, 2017
# All eigenvalues of Covariance matrix are >= 0
# Ensure root of zero is not evaluated as nan
try:
err_mgt = np.seterr(all='ignore')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use with np.errstate

gamma=settings_gamma,
coef0=settings_coef0)

# kpca_transform = kpca.fit_transform(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the debugging code

Copy link
Member

@raghavrv raghavrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Please add a whatsnew entry...

# Ensure root of zero is not evaluated as nan

with np.errstate(all='ignore'):
X_transformed = self.alphas_*np.nan_to_num(np.sqrt(self.lambdas_))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces surrounding *

@@ -3,7 +3,8 @@

from sklearn.utils.testing import (assert_array_almost_equal, assert_less,
assert_equal, assert_not_equal,
assert_raises)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rewrite it as one import per line? (helps avoid merge conflicts)

coef0=settings_coef0)

output = assert_no_warnings(kpca.fit_transform, data)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all blank lines can be removed... (except the one after data maybe)


output = assert_no_warnings(kpca.fit_transform, data)

assert_false(np.isnan(output).any())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just to confirm, it fails at master?)

Copy link
Author

@AlaaMoussawi AlaaMoussawi Mar 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code would not fail at master, but it would output warning messages, and not give quite the desired output. Also, I'm not quite sure what a "whatsnew" entry is, but I don't think this would qualify. If you feel that it would, could you please provide a link that I can use to understand more about what a "whatsnew" entry is?


kpca = KernelPCA(n_components=data.shape[1],
kernel=kernel_type,
degree=settings_degree,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directly substitute all of them as you are anyway using named args for clarity...

@raghavrv raghavrv changed the title Fix for 8368 (Addresses nan output from fit_transform() in kernelPCA) [WIP] Fix for 8368 (Addresses nan output from fit_transform() in kernelPCA) Mar 13, 2017
@raghavrv raghavrv added the Bug label Mar 13, 2017
@jnothman
Copy link
Member

Please provide more descriptive commit messages

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jnothman jnothman changed the title [WIP] Fix for 8368 (Addresses nan output from fit_transform() in kernelPCA) [MRG+1] Fix for 8368 (Addresses nan output from fit_transform() in kernelPCA) Mar 22, 2017
Copy link
Member

@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should set negative eigenvalues (i.e. self.lambdas_ to zero in _fit_transform after the if self.remove_zero_eig or self.n_components is None.

@jnothman
Copy link
Member

jnothman commented Mar 22, 2017 via email

@lesteve
Copy link
Member

lesteve commented Mar 22, 2017

i think the problem may be that sometimes it stores negative zero, no?

I think so. It feels that changing self.lambdas_ in _fit_transform (which is actually called by fit) is a cleaner fix. At the moment the fix is in fit_transform but not transform for example.

@jnothman
Copy link
Member

jnothman commented Mar 22, 2017 via email

@jnothman
Copy link
Member

jnothman commented Mar 22, 2017 via email

@lesteve
Copy link
Member

lesteve commented Mar 22, 2017

Good catch. Will explicitly setting to zero ensure it's never negative? I suppose so.

I am not a KernelPCA expert by any means but it seems from this line that when remove_zero_eig is True, we remove negative values in self.lambdas_ so it may make sense to set them to zero when remove_zero_eig is False, especially since you are going to take np.sqrt to self.lambdas_ further down the line.

Maybe the 'sigmoid' kernel is to blame, I am not sure ... in the stand-alone snippet from the associated issue, kpca.lambdas_ has a negative eigenvalue of the order 1e-3, so it is hard to blame floating point precision for this.

@jnothman jnothman added this to the 0.19 milestone Jun 18, 2017
@rth rth modified the milestones: 0.19, 0.22 Jun 16, 2019
@amueller
Copy link
Member

amueller commented Aug 6, 2019

What's the status of this? I can still reproduce the issue on master.

@jnothman
Copy link
Member

can a couple of people find time to think this through?

@NicolasHug
Copy link
Member

Now that #12145 is merged, the error is

ValueError: There are significant negative eigenvalues (0.19119 of the maximum positive). Either the matrix is not PSD, or there was an issue while computing the eigendecomposition of the matrix.

Tweeking coef0 and / or gamma (or just the kernel) will remove the error.

So it seems that the specific settings of the kernel are the cause of the issue, and it's now properly handled by erroring with a nice message instead of returning NaNs. I'll then close the issue and the PR, but feel free to re-open if I missed something. Thanks everyone for the efforts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KPCA fit_transform() will be nan when lambdas_ is negative
7 participants