[MRG+1] Enforce deterministic output in kernel PCA #13241

bellet · 2019-02-25T11:27:41Z

This PR adds a call to svd_flip to ensure that the output of KPCA is deterministic, as done for PCA.

Example code:

import numpy as np
from sklearn.decomposition import KernelPCA

np.random.seed(42)
data = np.random.rand(12).reshape(4, 3)

for i in range(10):
    kpca = KernelPCA(n_components=2, eigen_solver='arpack')
    print(kpca.fit_transform(data)[0])

Output before this PR:

[-0.42930193 -0.11840723]
[0.42930193 0.11840723]
[-0.42930193 -0.11840723]
[0.42930193 0.11840723]
[-0.42930193 -0.11840723]
[0.42930193 0.11840723]
[-0.42930193 -0.11840723]
[ 0.42930193 -0.11840723]
[0.42930193 0.11840723]
[-0.42930193 -0.11840723]

With this PR:

[-0.42930193 -0.11840723]
[-0.42930193 -0.11840723]
[-0.42930193 -0.11840723]
[-0.42930193 -0.11840723]
[-0.42930193 -0.11840723]
[-0.42930193 -0.11840723]
[-0.42930193 -0.11840723]
[-0.42930193 -0.11840723]
[-0.42930193 -0.11840723]
[-0.42930193 -0.11840723]

Some comments:

Calling svd_flip directly requires to copy the fitted self.alphas_, which is not great. The alternative would be to add an equivalent of svd_flip for eigendecomposition (taking a single input instead of two). I can easily do this if this is the preferred solution.
This deterministic behavior does not seem to be tested for PCA, so I did not add any test here either. Can do if needed.

agramfort · 2019-02-25T12:55:26Z

I confirm that it fixes the pb but can you still add a test and udpate what's new?

yes it should be done for PCA too.

bellet · 2019-02-25T13:44:20Z

I confirm that it fixes the pb but can you still add a test and udpate what's new?

yes it should be done for PCA too.

Done.

agramfort · 2019-02-25T13:55:26Z

sklearn/decomposition/tests/test_kernel_pca.py

@@ -71,6 +71,23 @@ def test_kernel_pca_consistent_transform():
    assert_array_almost_equal(transformed1, transformed2)


+def test_kernel_pca_deterministic_output():
+    state = np.random.RandomState(0)


state -> rng

it's how we call RandomState usually.

agramfort

besides my nitpick

good to go from my end

bellet · 2019-02-25T14:02:58Z

Done!

vene · 2019-02-25T14:52:54Z

sklearn/decomposition/kernel_pca.py

+        # flip eigenvectors' sign to enforce deterministic output
+        # note: copying the second element is needed so that both inputs do
+        # not refer to the same object
+        self.alphas_, _ = svd_flip(self.alphas_, self.alphas_.copy().T)


would it work to do

svd_flip(self.alphas_, np.empty_like(self.alphas_).T)

to avoid a memory copy?

Correct since the value of the second argument does not affect our output. Done

rth · 2019-02-25T14:47:58Z

sklearn/decomposition/tests/test_kernel_pca.py

+            transformed_X[i, :] = kpca.fit_transform(X)[0]
+            i += 1
+
+    assert np.isclose(transformed_X, transformed_X[0, :]).all()


Better to use numpy.testsing.assert_allclose

I find that undestand what is done in this test takes some effort here because we check equality for both different random seeds and solvers (and we store the results in a matrix that has non nstandard (n_random_seeds*n_solvers, n_components) shape. Maybe something like,

for solver in eigen_solver: transformed_X = np.zeros((10, 2)) for i in range(10): kpca = KernelPCA(n_components=2, eigen_solver=solver, random_state=i) transformed_X[i, :] = kpca.fit_transform(X)[0] assert_allclose(transformed_X, transformed_X[0, :])

could be easier?

also is it needed to set the random state to i rather than just use random_state=rng?

random_state=rng would use the same seed for all runs. The point here is to check that the arpack random state does not result in different solutions because of sign ambiguity

I used random_state=i to show explicitly that the random state changes from one run to the next. But removing it would be valid as well as the random state would still change

I may be wrong but i don't agree that random_state=rng would use the same seed, unless rng is an int. Here, rng is a RandomState object (initialized a few lines above) so it has internal state. For instance:

~{main}$ ipython Python 3.7.2 (default, Feb 11 2019, 11:12:39) Type 'copyright', 'credits' or 'license' for more information IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import numpy as np In [2]: rng = np.random.RandomState(0) In [3]: rng.randn(3) Out[3]: array([1.76405235, 0.40015721, 0.97873798]) In [4]: rng.randn(3) Out[4]: array([ 2.2408932 , 1.86755799, -0.97727788])

You're right, I can switch to that

rth · 2019-02-25T14:56:38Z

sklearn/decomposition/tests/test_pca.py

+            transformed_X[i, :] = pca.fit_transform(X)[0]
+            i += 1
+
+    assert np.isclose(transformed_X, transformed_X[0, :]).all()


Same comments as above

vene · 2019-02-25T15:37:22Z

sklearn/decomposition/kernel_pca.py

+        # note: copying the second element is needed so that both inputs do
+        # not refer to the same object
+        self.alphas_, _ = svd_flip(self.alphas_,
+                                   np.empty_like(self.alphas_).T)


awesome, thanks! maybe remove the note: above since it's no longer relevant.

bellet · 2019-02-25T16:03:21Z

I have simplified the tests as suggested by @vene @rth

vene · 2019-02-25T16:52:09Z

LGTM pending CI pass! Thanks @bellet

agramfort · 2019-02-26T07:15:23Z

thx @bellet

rth · 2019-02-26T07:25:09Z

Thanks !

Does this mean we can remove the _UnstableOn32BitMixin from the KernelPCA class and tests will pass on 32 bit arch as well? (The corresponding estimator tag is non_deterministic)

ogrisel · 2019-03-05T10:34:46Z

The CRON jobs for nightly builds on linux 32 bit pass:

https://travis-ci.org/MacPython/scikit-learn-wheels/builds

* enforce deterministic output in kernel PCA * add tests and update whats new * replace state by rng * simplified assert * avoid copy * clarify tests * remove now useless comment * use rng as seed everywhere

…arn#13241)" This reverts commit 11d5539.

* enforce deterministic output in kernel PCA * add tests and update whats new * replace state by rng * simplified assert * avoid copy * clarify tests * remove now useless comment * use rng as seed everywhere

enforce deterministic output in kernel PCA

3986762

add tests and update whats new

3b1590f

agramfort reviewed Feb 25, 2019

View reviewed changes

agramfort approved these changes Feb 25, 2019

View reviewed changes

replace state by rng

2ebeb98

bellet changed the title ~~[MRG] Enforce deterministic output in kernel PCA~~ [MRG+1] Enforce deterministic output in kernel PCA Feb 25, 2019

simplified assert

348d9a7

vene reviewed Feb 25, 2019

View reviewed changes

rth reviewed Feb 25, 2019

View reviewed changes

avoid copy

3370c21

vene reviewed Feb 25, 2019

View reviewed changes

bellet added 3 commits February 25, 2019 16:56

clarify tests

0e1c1df

remove now useless comment

d4b46ee

use rng as seed everywhere

ac5f543

vene approved these changes Feb 25, 2019

View reviewed changes

agramfort merged commit 8647658 into scikit-learn:master Feb 26, 2019

rth mentioned this pull request Feb 26, 2019

[0.20.2] test_non_meta_estimators fails on Powerpc 64 bit little endian #13051

Closed

bellet deleted the kpca_sign branch February 26, 2019 08:03

agramfort mentioned this pull request Feb 26, 2019

TST remove _UnstableOn32BitMixin from kpca class #13284

Merged

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "[MRG+1] Enforce deterministic output in kernel PCA (scikit-le…

02d4eae

…arn#13241)" This reverts commit 11d5539.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "[MRG+1] Enforce deterministic output in kernel PCA (scikit-le…

cd5a0a9

…arn#13241)" This reverts commit 11d5539.

thomasjpfan mentioned this pull request May 27, 2022

Precision errors in KernelPCA #5970

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+1] Enforce deterministic output in kernel PCA #13241

[MRG+1] Enforce deterministic output in kernel PCA #13241

bellet commented Feb 25, 2019

agramfort commented Feb 25, 2019

bellet commented Feb 25, 2019

agramfort Feb 25, 2019

agramfort left a comment

bellet commented Feb 25, 2019

vene Feb 25, 2019

bellet Feb 25, 2019

rth Feb 25, 2019

rth Feb 25, 2019

vene Feb 25, 2019

bellet Feb 25, 2019

bellet Feb 25, 2019

vene Feb 25, 2019

bellet Feb 25, 2019

rth Feb 25, 2019

vene Feb 25, 2019

bellet commented Feb 25, 2019

vene commented Feb 25, 2019

agramfort commented Feb 26, 2019

rth commented Feb 26, 2019 •

edited

Loading

ogrisel commented Mar 5, 2019

[MRG+1] Enforce deterministic output in kernel PCA #13241

[MRG+1] Enforce deterministic output in kernel PCA #13241

Conversation

bellet commented Feb 25, 2019

agramfort commented Feb 25, 2019

bellet commented Feb 25, 2019

Choose a reason for hiding this comment

agramfort left a comment

Choose a reason for hiding this comment

bellet commented Feb 25, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bellet commented Feb 25, 2019

vene commented Feb 25, 2019

agramfort commented Feb 26, 2019

rth commented Feb 26, 2019 • edited Loading

ogrisel commented Mar 5, 2019

rth commented Feb 26, 2019 •

edited

Loading