-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Enforce deterministic output in kernel PCA #13241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I confirm that it fixes the pb but can you still add a test and udpate what's new? yes it should be done for PCA too. |
Done. |
@@ -71,6 +71,23 @@ def test_kernel_pca_consistent_transform(): | |||
assert_array_almost_equal(transformed1, transformed2) | |||
|
|||
|
|||
def test_kernel_pca_deterministic_output(): | |||
state = np.random.RandomState(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
state -> rng
it's how we call RandomState usually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
besides my nitpick
good to go from my end
Done! |
sklearn/decomposition/kernel_pca.py
Outdated
# flip eigenvectors' sign to enforce deterministic output | ||
# note: copying the second element is needed so that both inputs do | ||
# not refer to the same object | ||
self.alphas_, _ = svd_flip(self.alphas_, self.alphas_.copy().T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it work to do
svd_flip(self.alphas_, np.empty_like(self.alphas_).T)
to avoid a memory copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct since the value of the second argument does not affect our output. Done
transformed_X[i, :] = kpca.fit_transform(X)[0] | ||
i += 1 | ||
|
||
assert np.isclose(transformed_X, transformed_X[0, :]).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use numpy.testsing.assert_allclose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that undestand what is done in this test takes some effort here because we check equality for both different random seeds and solvers (and we store the results in a matrix that has non nstandard (n_random_seeds*n_solvers, n_components)
shape. Maybe something like,
for solver in eigen_solver:
transformed_X = np.zeros((10, 2))
for i in range(10):
kpca = KernelPCA(n_components=2, eigen_solver=solver,
random_state=i)
transformed_X[i, :] = kpca.fit_transform(X)[0]
assert_allclose(transformed_X, transformed_X[0, :])
could be easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also is it needed to set the random state to i rather than just use random_state=rng
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random_state=rng
would use the same seed for all runs. The point here is to check that the arpack random state does not result in different solutions because of sign ambiguity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used random_state=i
to show explicitly that the random state changes from one run to the next. But removing it would be valid as well as the random state would still change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be wrong but i don't agree that random_state=rng
would use the same seed, unless rng
is an int. Here, rng
is a RandomState
object (initialized a few lines above) so it has internal state. For instance:
~{main}$ ipython
Python 3.7.2 (default, Feb 11 2019, 11:12:39)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import numpy as np
In [2]: rng = np.random.RandomState(0)
In [3]: rng.randn(3)
Out[3]: array([1.76405235, 0.40015721, 0.97873798])
In [4]: rng.randn(3)
Out[4]: array([ 2.2408932 , 1.86755799, -0.97727788])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I can switch to that
transformed_X[i, :] = pca.fit_transform(X)[0] | ||
i += 1 | ||
|
||
assert np.isclose(transformed_X, transformed_X[0, :]).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as above
# note: copying the second element is needed so that both inputs do | ||
# not refer to the same object | ||
self.alphas_, _ = svd_flip(self.alphas_, | ||
np.empty_like(self.alphas_).T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome, thanks! maybe remove the note:
above since it's no longer relevant.
LGTM pending CI pass! Thanks @bellet |
thx @bellet |
Thanks ! Does this mean we can remove the |
The CRON jobs for nightly builds on linux 32 bit pass: |
* enforce deterministic output in kernel PCA * add tests and update whats new * replace state by rng * simplified assert * avoid copy * clarify tests * remove now useless comment * use rng as seed everywhere
…arn#13241)" This reverts commit 11d5539.
…arn#13241)" This reverts commit 11d5539.
* enforce deterministic output in kernel PCA * add tests and update whats new * replace state by rng * simplified assert * avoid copy * clarify tests * remove now useless comment * use rng as seed everywhere
Fixes #8798
This PR adds a call to
svd_flip
to ensure that the output of KPCA is deterministic, as done for PCA.Example code:
Output before this PR:
With this PR:
Some comments:
svd_flip
directly requires to copy the fittedself.alphas_
, which is not great. The alternative would be to add an equivalent ofsvd_flip
for eigendecomposition (taking a single input instead of two). I can easily do this if this is the preferred solution.