-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX a bug in KernelPCA.inverse_transform #19732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX a bug in KernelPCA.inverse_transform #19732
Conversation
83a7e0a
to
c3822ac
Compare
We had a thorough look at this issue/PR with @ogrisel. So you are completely right that it's unjustified to add the regularization parameter at the reconstruction. Indeed, we think that it is not really useful to account for the mean loss with the linear kernel. It makes the linear kernel special while one should use So for this PR, we can revert the change that was adding the In addition, we think that we can do the following improvements:
In addition, I am thinking about some other improvements:
@kstoneriv3 Would you mind to go forward by reverting the previous PR and improving the test? Do you wish to contribute to the future improvements? |
…kernel_pca_inverse_transform
I agree that we should rather revert the bug and improve the documentation.
OK, I will replace the old test with this one.
Do you mean raising a warning when I generally agree with all of the suggestions you made. If no further discussions are needed before these updates, I can do it. Would you like to have some of these updates in the form of separate PRs? |
if we do this, we should do it in a dedicated PR with backward compat. I think it's low priority. |
Improving the docstring would be enough. |
Indeed separate PRs would be helpful. Let's start with undoing the For instance for the test we can keep a test that checks that we can approximately recover the original data using a frobenius norm of For the test data, you could try to use |
I tried the Swiss roll but the reconstruction quality was not very good. This is because the Swiss roll kernel PCA's 'rbf' kernel cannot well capture the similarity of data points along the third axis. So I just used |
I will make changes to the document later. |
I have a question: It seems that centering of the kernel is not applied in the |
Isn't it what |
It seems to me that |
The progress status of this PR:
The followings can be included in this PR if we agree on what to rename them.
The following should be in separate PRs I guess.
|
@kstoneriv3 I think that we can limit this PR to the first two points. It will be easier to review and merge. So I will review shortly this PR (@ogrisel can make the second review maybe). In parallel, do not hesitate to open subsequent PRs already. |
OK, great. Then I will leave this PR as it is and make separate PRs for other issues. |
I tried to reproduce the "denoising example from Section 4" but the denoising quality is not as good as I expected (left: training images, middle test images to be denoised, right: denoised images). At this quality, I would not add a new example of denoising or mention the effect of alpha on denoising in the document. This might be due to the fact that sklearn's kernel PCA shares kernel parameters while in the paper they use RBF kernel with different parameters for kernel PCA itself and preimage. The code is available at my gist. |
I tried to reproduce the exact example with the same hyperparameter as in the paper and the same dataset (USPS). I put the code in the "details" section. I get close results to the original paper: # %%
from sklearn.datasets import fetch_openml
usps = fetch_openml(data_id=41082)
# %%
data = usps.data
target = usps.target
# %%
import numpy as np
img = np.reshape(data.iloc[0].to_numpy(), (16, 16))
# %%
import matplotlib.pyplot as plt
plt.imshow(img)
# %%
from sklearn.model_selection import train_test_split
data_rest, data_train, target_rest, target_train = train_test_split(
data, target, stratify=target, random_state=42, test_size=100,
)
data_rest, data_test, target_rest, target_test = train_test_split(
data_rest, target_rest, stratify=target_rest, random_state=42,
test_size=100,
)
data_train, data_test = data_train.to_numpy(), data_test.to_numpy()
# %%
fig, axs = plt.subplots(nrows=10, ncols=10, figsize=(15, 15))
for img, ax in zip(data_test, axs.ravel()):
ax.imshow(img.reshape((16, 16)), cmap="Greys")
ax.axis("off")
_ = fig.suptitle("Uncorrupted test dataset")
# %%
rng = np.random.RandomState(0)
noise = rng.normal(scale=0.5, size=(data_train.shape))
data_test_corrupted = data_test + noise
# %%
fig, axs = plt.subplots(nrows=10, ncols=10, figsize=(15, 15))
for img, ax in zip(data_test_corrupted, axs.ravel()):
ax.imshow(img.reshape((16, 16)), cmap="Greys")
ax.axis("off")
_ = fig.suptitle(
f"Corrupted test data: "
f"MSE={np.mean((data_test - data_test_corrupted) ** 2):.2f}",
size=26,
)
# %%
from sklearn.decomposition import KernelPCA
kpca = KernelPCA(
n_components=80, kernel="rbf", gamma=0.5, fit_inverse_transform=True,
alpha=1.0,
)
# %%
kpca.fit(data_train)
# %%
import pandas as pd
data_reconstruct = kpca.inverse_transform(kpca.transform(data_test))
# %%
fig, axs = plt.subplots(nrows=10, ncols=10, figsize=(15, 15))
for img, ax in zip(data_reconstruct, axs.ravel()):
ax.imshow(img.reshape((16, 16)), cmap="Greys")
ax.axis("off")
_ = fig.suptitle(
f"Denoising using Kernel PCA with RBF kernel: "
f"MSE={np.mean((data_test - data_reconstruct) ** 2):.2f}",
size=26,
)
# %%
from sklearn.decomposition import PCA
pca = PCA(n_components=32)
pca.fit(data_train)
data_reconstruct = pca.inverse_transform(pca.transform(data_test_corrupted))
# %%
fig, axs = plt.subplots(nrows=10, ncols=10, figsize=(15, 15))
for img, ax in zip(data_reconstruct, axs.ravel()):
ax.imshow(img.reshape((16, 16)), cmap="Greys")
ax.axis("off")
_ = fig.suptitle(
f"Denosing using PCA: "
f"MSE={np.mean((data_test - data_reconstruct) ** 2):.2f}",
size=26
) |
We might want to play with the parameter alpha since I am not sure about the standardization which is not super precise and I am not sure which normalization was applied to the dataset available in OpenML. |
@glemaitre Thank you! The denoising quality is surprisingly better! Now it makes sense to add this to examples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I am wondering if we could make the reconstruction better but I don' t know if it is needed. I would rely on a review of @ogrisel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick note and this should be good:
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@kstoneriv3 thanks for your work. I started to modify the example and we saw with @ogrisel that this bug was really affecting the results of the reconstruction. I will push my changes tomorrow for this example. If you want you can review it. |
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Reference Issues/PRs
Fix #18902.
As discussed in #18902,
PCA
reconstructs the mean of the data whileKernelPCA
does not. This results in inconsistent inverse transformation byPCA
and (linear-)KernelPCA
. This inconsistency led to the misunderstanding and the introduction of a bug in #16655.What does this implement/fix? Explain your changes.
As discussed above, a bug was introduced in
KernelPCA.inverse_transform
in #16655. This PR removes this bug.Additionally, I suggest a small modification to the
KernelPCA.inverse_transform
to improve the compatibility withPCA.inverse_transform
when the linear kernel is used. I propose to handle this issue by reconstructing the mean when the linear kernel is used so thatKernelPCA.inverse_transform
can reconstruct the mean in the same way asPCA.inverse_transform
does.