Fix Make centering inplace in KernelPCA #29100

jeremiedbb · 2024-05-24T11:46:17Z

KernelPCA has a copy_X parameter which makes one think that it may perform inplace operations. The operation that could be done inplace is the centering of the kernel using a KernelCenterer object. However it makes a copy before applying the centering by default.

So currently, copy_X has the only effect of making an unnecessary copy. Since there's already a copy_X parameter, we can safely make the centering inplace in fit.

Remark

The centering could also be done inplace in transform, but given the semantic of copy_X, it may be confusing. Maybe to leave it as is for now and wait for a global rework of copy/copy_X.

scikit-learn/sklearn/decomposition/_kernel_pca.py

Lines 149 to 152 in a63b021

    
               copy_X : bool, default=True 
        
                   If True, input X is copied and stored by the model in the `X_fit_` 
        
                   attribute. If no further changes will be done to X, setting 
        
                   `copy_X=False` saves memory by storing a reference.

I'm not sure this needs a test or even a entry in the changelog, tell me :)

github-actions · 2024-05-24T11:47:30Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: d436bcd. Link to the linter CI: here}

OmarManzoor

LGTM. Thanks @jeremiedbb

lesteve · 2024-05-31T06:49:16Z

Just curious does this optimization make any difference on a real use case? Is it more some kind of clean-up?

I am not a big fan of the function with side-effect which makes it easier to shoot yourself in the foot later.

Maybe two suggestions:

change the function name to make it clear that it changes the argument in place, I don't have a good suggestion ... maybe _fit_transform_in_place _fit_transform_no_copy or something similar?
add a comment somewhere saying that it is fine to do it in place because K is created in fit and is not reused afterwards?

…lace-centering

jeremiedbb · 2024-06-04T11:28:57Z

I am not a big fan of the function with side-effect

The purpose of the copy param is to have functions and estimators with side effects 😄

I added some comments to make it clear that we do in place ops on purpose.

Just curious does this optimization make any difference on a real use case? Is it more some kind of clean-up?

I was trying to build a common test for estimators with a copy param. This one was failing because the copy param had no effect. So it was either deprecating the param or make it have an actual effect. Since I believe that the param was introduced to center the kernel in place and it was an easy fix I went for this option.

lesteve · 2024-06-04T12:17:26Z

sklearn/decomposition/_kernel_pca.py

@@ -439,7 +439,9 @@ def fit(self, X, y=None):
        self.gamma_ = 1 / X.shape[1] if self.gamma is None else self.gamma
        self._centerer = KernelCenterer().set_output(transform="default")
        K = self._get_kernel(X)
-        self._fit_transform(K)
+        # safe to perform in place operations on K because a copy was made before if


So I think that was on the thing I was not sure about: K is a new array, right? Is there is a way somehow that K is a view on X for a particular kernel and that transforming K in place will change X?

K is X in the case of a precomputed kernel and copy=False, The goal of copy=False is to allow to change X in place

OK, the 'precomputed' case was not obvious to me, can you add this to the comment, and also find a way to mention that copy_X has already made a copy if needed (this is something you probably mentioned in the issue but that wasn't obvious to me either 😅)

Indeed let's add a comment to make that case explicit.

lesteve · 2024-06-04T12:36:37Z

The purpose of the copy param is to have functions and estimators with side effects 😄

I am not a big fan of this either 😉. Also since copy=False semantics is "this may avoid a copy in some cases, hope for the best", it seems rather brittle.

jeremiedbb · 2024-06-04T12:45:14Z

Also since copy=False semantics is "this may avoid a copy in some cases, hope for the best", it seems rather brittle.

The purpose of copy is to allow the estimator to make in place operations on the input X (copy=False). You set it only if you don't intend to use X afterwards.

jeremiedbb · 2024-06-04T12:46:23Z

Anyway let's not use this PR as the place to discuss the future of the copy parameter :)

lesteve · 2024-06-04T12:48:37Z

The purpose of copy is to allow the estimator to make in place operations on the input X (copy=False). You set it only if you don't intend to use X afterwards.

Sorry I was probably not explicit enough with "hope for the best", I meant copy=False can still make a copy, in other words there is no guarantee that copy=False makes no copy.

Yes I agree about not turning this PR into a discussion about the copy parameter 😉

…lace-centering

lesteve

I think the comment is good enough to be able to reconstruct the reasoning, thanks!

inplace kernel centering in KernelPCA

192e543

jeremiedbb added the Quick Review For PRs that are quick to review label May 24, 2024

github-actions bot added the module:decomposition label May 24, 2024

jeremiedbb mentioned this pull request May 24, 2024

FEA Add writeable parameter to check_array #29018

Merged

OmarManzoor approved these changes May 29, 2024

View reviewed changes

jeremiedbb added 2 commits June 4, 2024 13:16

Merge remote-tracking branch 'upstream/main' into make-kernel-pca-inp…

6064cd8

…lace-centering

add some comments

cbb13f1

lesteve reviewed Jun 4, 2024

View reviewed changes

jeremiedbb added 2 commits June 20, 2024 14:51

Merge remote-tracking branch 'upstream/main' into make-kernel-pca-inp…

515ba14

…lace-centering

improve comment

d436bcd

lesteve approved these changes Jun 20, 2024

View reviewed changes

lesteve merged commit ba2c93b into scikit-learn:main Jun 20, 2024
30 checks passed

lesteve mentioned this pull request Jun 21, 2024

⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Jun 21, 2024) ⚠️ #29325

Closed

ogrisel mentioned this pull request Jun 21, 2024

FIX missing force_writeable in KernelCenterer.transform #29328

Merged

Charlie-XIAO mentioned this pull request Jun 21, 2024

DOC tweak appearance of installation instructions #29160

Merged

jeremiedbb mentioned this pull request Jul 2, 2024

Release 1.5.1 #29382

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix Make centering inplace in KernelPCA #29100

Fix Make centering inplace in KernelPCA #29100

Uh oh!

jeremiedbb commented May 24, 2024 •

edited

Loading

Uh oh!

github-actions bot commented May 24, 2024 •

edited

Loading

Uh oh!

OmarManzoor left a comment

Uh oh!

lesteve commented May 31, 2024

Uh oh!

jeremiedbb commented Jun 4, 2024

Uh oh!

lesteve Jun 4, 2024 •

edited

Loading

Uh oh!

jeremiedbb Jun 4, 2024

Uh oh!

lesteve Jun 4, 2024 •

edited

Loading

Uh oh!

ogrisel Jun 19, 2024

Uh oh!

lesteve commented Jun 4, 2024

Uh oh!

jeremiedbb commented Jun 4, 2024

Uh oh!

jeremiedbb commented Jun 4, 2024

Uh oh!

lesteve commented Jun 4, 2024 •

edited

Loading

Uh oh!

lesteve left a comment

Uh oh!

Uh oh!

Uh oh!

	copy_X : bool, default=True
	If True, input X is copied and stored by the model in the `X_fit_`
	attribute. If no further changes will be done to X, setting
	`copy_X=False` saves memory by storing a reference.

Uh oh!

Fix Make centering inplace in KernelPCA #29100

Fix Make centering inplace in KernelPCA #29100

Uh oh!

Conversation

jeremiedbb commented May 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

lesteve commented May 31, 2024

Uh oh!

jeremiedbb commented Jun 4, 2024

Uh oh!

lesteve Jun 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeremiedbb Jun 4, 2024

Choose a reason for hiding this comment

Uh oh!

lesteve Jun 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Jun 19, 2024

Choose a reason for hiding this comment

Uh oh!

lesteve commented Jun 4, 2024

Uh oh!

jeremiedbb commented Jun 4, 2024

Uh oh!

jeremiedbb commented Jun 4, 2024

Uh oh!

lesteve commented Jun 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lesteve left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jeremiedbb commented May 24, 2024 •

edited

Loading

github-actions bot commented May 24, 2024 •

edited

Loading

lesteve Jun 4, 2024 •

edited

Loading

lesteve Jun 4, 2024 •

edited

Loading

lesteve commented Jun 4, 2024 •

edited

Loading