Skip to content

Fix Make centering inplace in KernelPCA #29100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jeremiedbb
Copy link
Member

@jeremiedbb jeremiedbb commented May 24, 2024

KernelPCA has a copy_X parameter which makes one think that it may perform inplace operations. The operation that could be done inplace is the centering of the kernel using a KernelCenterer object. However it makes a copy before applying the centering by default.

So currently, copy_X has the only effect of making an unnecessary copy. Since there's already a copy_X parameter, we can safely make the centering inplace in fit.

Remark

The centering could also be done inplace in transform, but given the semantic of copy_X, it may be confusing. Maybe to leave it as is for now and wait for a global rework of copy/copy_X.

copy_X : bool, default=True
If True, input X is copied and stored by the model in the `X_fit_`
attribute. If no further changes will be done to X, setting
`copy_X=False` saves memory by storing a reference.

I'm not sure this needs a test or even a entry in the changelog, tell me :)

@jeremiedbb jeremiedbb added the Quick Review For PRs that are quick to review label May 24, 2024
Copy link

github-actions bot commented May 24, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: d436bcd. Link to the linter CI: here

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @jeremiedbb

@lesteve
Copy link
Member

lesteve commented May 31, 2024

Just curious does this optimization make any difference on a real use case? Is it more some kind of clean-up?

I am not a big fan of the function with side-effect which makes it easier to shoot yourself in the foot later.

Maybe two suggestions:

  • change the function name to make it clear that it changes the argument in place, I don't have a good suggestion ... maybe _fit_transform_in_place _fit_transform_no_copy or something similar?
  • add a comment somewhere saying that it is fine to do it in place because K is created in fit and is not reused afterwards?

@jeremiedbb
Copy link
Member Author

I am not a big fan of the function with side-effect

The purpose of the copy param is to have functions and estimators with side effects 😄

I added some comments to make it clear that we do in place ops on purpose.

Just curious does this optimization make any difference on a real use case? Is it more some kind of clean-up?

I was trying to build a common test for estimators with a copy param. This one was failing because the copy param had no effect. So it was either deprecating the param or make it have an actual effect. Since I believe that the param was introduced to center the kernel in place and it was an easy fix I went for this option.

@@ -439,7 +439,9 @@ def fit(self, X, y=None):
self.gamma_ = 1 / X.shape[1] if self.gamma is None else self.gamma
self._centerer = KernelCenterer().set_output(transform="default")
K = self._get_kernel(X)
self._fit_transform(K)
# safe to perform in place operations on K because a copy was made before if
Copy link
Member

@lesteve lesteve Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think that was on the thing I was not sure about: K is a new array, right? Is there is a way somehow that K is a view on X for a particular kernel and that transforming K in place will change X?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K is X in the case of a precomputed kernel and copy=False, The goal of copy=False is to allow to change X in place

Copy link
Member

@lesteve lesteve Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the 'precomputed' case was not obvious to me, can you add this to the comment, and also find a way to mention that copy_X has already made a copy if needed (this is something you probably mentioned in the issue but that wasn't obvious to me either 😅)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed let's add a comment to make that case explicit.

@lesteve
Copy link
Member

lesteve commented Jun 4, 2024

The purpose of the copy param is to have functions and estimators with side effects 😄

I am not a big fan of this either 😉. Also since copy=False semantics is "this may avoid a copy in some cases, hope for the best", it seems rather brittle.

@jeremiedbb
Copy link
Member Author

Also since copy=False semantics is "this may avoid a copy in some cases, hope for the best", it seems rather brittle.

The purpose of copy is to allow the estimator to make in place operations on the input X (copy=False). You set it only if you don't intend to use X afterwards.

@jeremiedbb
Copy link
Member Author

Anyway let's not use this PR as the place to discuss the future of the copy parameter :)

@lesteve
Copy link
Member

lesteve commented Jun 4, 2024

The purpose of copy is to allow the estimator to make in place operations on the input X (copy=False). You set it only if you don't intend to use X afterwards.

Sorry I was probably not explicit enough with "hope for the best", I meant copy=False can still make a copy, in other words there is no guarantee that copy=False makes no copy.

Yes I agree about not turning this PR into a discussion about the copy parameter 😉

Copy link
Member

@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment is good enough to be able to reconstruct the reasoning, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:decomposition Quick Review For PRs that are quick to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants