TST use global_random_seed in `sklearn/decomposition/tests/test_sparse_pca.py` #31213

DeaMariaLeon · 2025-04-16T10:07:07Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

I wonder why test_mini_batch_fit_transform is kept. It is skipped all the time. It comes from PR #12253

test_fit_transform test_fit_transform_parallel test_fit_transform_tall test_initialization test_scaling_fit_transform test_pca_vs_spca test_sparse_pca_inverse_transform test_transform_inverse_transform_round_trip

github-actions · 2025-04-16T10:07:58Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 0bb7336. Link to the linter CI: here}

DeaMariaLeon · 2025-04-16T10:41:21Z

Testing with all the seed values were passing locally.

lucyleeow

LGTM!

(no idea about test_mini_batch_fit_transform though ...)

jeremiedbb

I think you can add global_random_seed to

test_transform_nan
test_sparse_pca_numerical_consistency

jeremiedbb · 2025-04-17T09:31:35Z

sklearn/decomposition/tests/test_sparse_pca.py

+    spca = SparsePCA(
+        alpha=0, ridge_alpha=0, n_components=2, random_state=global_random_seed
+    )
+    pca = PCA(n_components=2, random_state=global_random_seed)


It's not required to start from the same seedin the test, is it ?

Suggested change

spca = SparsePCA(

alpha=0, ridge_alpha=0, n_components=2, random_state=global_random_seed

)

pca = PCA(n_components=2, random_state=global_random_seed)

spca = SparsePCA(

alpha=0, ridge_alpha=0, n_components=2, random_state=rng

)

pca = PCA(n_components=2, random_state=rng)

Thank you.
I need to ask (maybe @glemaitre ?) why sometimes it's ok to use rng (from np.random.RandomState(global_random_seed)).. and sometimes the same seed -global_random_seeddirectly.

global_random_seed is a number like 0, 1, 42, ...
np.random.RandomState is an object with a state. When you ask it to generate a random value its state changes, so the next time you ast it again it gives a different value.
If you pass an integer as random state to an estimator, it's used to create and seed a RandomState object. Therefore, each time it's fitted, it starts from the same seed and produces the same results.

So to answer your question, it depends if need to generate multiple times the exact same sequence of generated value or if it doesn't matter.
For instance, if you want to compare the results of an estimator between float32 and float64 input, you want that both estimators use the same sequence of generated values.

Thank you very much @jeremiedbb... I see that my question wasn't clear.

In the file modified with this PR, some tests set SparcePCA parameter random_state as random_state=rng, where rng = np.random.RandomState(global_random_seed). For example test_fit_transform_tall.

But some tests set the random_state parameter as random_state=global_random_seed. Like test_sparse_pca_numerical_consistency.

I guess it doesn't matter as there is this check_random_state(seed) that "Turn seed into a np.random.RandomState instance".

In the end they might both be the same, except that changing random_state of test_sparse_pca_numerical_consistency to random_state=rng was failing.

Anyway, your previous explanation confirmed my understanding. :-)

jeremiedbb · 2025-04-17T09:42:20Z

I wonder why test_mini_batch_fit_transform is kept. It is skipped all the time.

Let's just remove it. It's useless to keep dead code in the code base.

DeaMariaLeon · 2025-04-17T16:29:06Z

I think you can add global_random_seed to

test_transform_nan

test_sparse_pca_numerical_consistency

I added the seed to those.
test_sparse_pca_numerical_consistency was failing a lot and I tried many things.. maybe now it's too sparse.. ? - the "many things" were increasing rtol, adding atol, etc.

test_fit_transform test_fit_transform_parallel test_transform_nan test_fit_transform_tall test_initialization test_scaling_fit_transform test_pca_vs_spca test_sparse_pca_numerical_consistency test_sparse_pca_inverse_transform test_transform_inverse_transform_round_trip

jeremiedbb · 2025-04-17T17:46:28Z

test_sparse_pca_numerical_consistency was failing a lot and I tried many things.. maybe now it's too sparse.. ? - the "many things" were increasing rtol, adding atol, etc.

Looking a bit, it seems to be an issue with the convergence criterion of DictionaryLearning (which is used by SparsePCA). It's not protected against rounding errors leading to negative values, so it may happen that in one case the algorithm keeps running for a few more iterations than in the other case.

It's not a major issue and not really related to this PR so I just switched the dataset for a more easy one for SparsePCA.

jeremiedbb

LGTM. Thanks

…e_pca.py` (scikit-learn#31213) Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>

DeaMariaLeon added 2 commits April 16, 2025 11:51

Adding global random seed

b7b079c

Run the CI for tests that take global_random_seed [all random seeds]

ce99f5b

test_fit_transform test_fit_transform_parallel test_fit_transform_tall test_initialization test_scaling_fit_transform test_pca_vs_spca test_sparse_pca_inverse_transform test_transform_inverse_transform_round_trip

github-actions bot added the module:decomposition label Apr 16, 2025

lucyleeow added the No Changelog Needed label Apr 17, 2025

lucyleeow approved these changes Apr 17, 2025

View reviewed changes

jeremiedbb reviewed Apr 17, 2025

View reviewed changes

DeaMariaLeon added 2 commits April 17, 2025 18:19

Add global seed to more tests

24ffffa

Merge remote-tracking branch 'upstream/main' into tests

64284e4

cln [azure parallel]

0bb7336

jeremiedbb approved these changes Apr 17, 2025

View reviewed changes

jeremiedbb enabled auto-merge (squash) April 17, 2025 18:11

jeremiedbb merged commit ceac4a8 into scikit-learn:main Apr 17, 2025
34 checks passed

DeaMariaLeon deleted the tests branch April 18, 2025 06:34

lucyleeow pushed a commit to EmilyXinyi/scikit-learn that referenced this pull request Apr 23, 2025

TST use global_random_seed in `sklearn/decomposition/tests/test_spars…

abfcdd0

…e_pca.py` (scikit-learn#31213) Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST use global_random_seed in `sklearn/decomposition/tests/test_sparse_pca.py` #31213

TST use global_random_seed in `sklearn/decomposition/tests/test_sparse_pca.py` #31213

DeaMariaLeon commented Apr 16, 2025

github-actions bot commented Apr 16, 2025 •

edited

Loading

DeaMariaLeon commented Apr 16, 2025

lucyleeow left a comment

jeremiedbb left a comment

jeremiedbb Apr 17, 2025

DeaMariaLeon Apr 17, 2025 •

edited

Loading

jeremiedbb Apr 17, 2025

DeaMariaLeon Apr 18, 2025

jeremiedbb commented Apr 17, 2025

DeaMariaLeon commented Apr 17, 2025 •

edited

Loading

jeremiedbb commented Apr 17, 2025

jeremiedbb left a comment

TST use global_random_seed in sklearn/decomposition/tests/test_sparse_pca.py #31213

TST use global_random_seed in sklearn/decomposition/tests/test_sparse_pca.py #31213

Conversation

DeaMariaLeon commented Apr 16, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Apr 16, 2025 • edited Loading

✔️ Linting Passed

DeaMariaLeon commented Apr 16, 2025

lucyleeow left a comment

Choose a reason for hiding this comment

jeremiedbb left a comment

Choose a reason for hiding this comment

jeremiedbb Apr 17, 2025

Choose a reason for hiding this comment

DeaMariaLeon Apr 17, 2025 • edited Loading

Choose a reason for hiding this comment

jeremiedbb Apr 17, 2025

Choose a reason for hiding this comment

DeaMariaLeon Apr 18, 2025

Choose a reason for hiding this comment

jeremiedbb commented Apr 17, 2025

DeaMariaLeon commented Apr 17, 2025 • edited Loading

jeremiedbb commented Apr 17, 2025

jeremiedbb left a comment

Choose a reason for hiding this comment

TST use global_random_seed in `sklearn/decomposition/tests/test_sparse_pca.py` #31213

TST use global_random_seed in `sklearn/decomposition/tests/test_sparse_pca.py` #31213

github-actions bot commented Apr 16, 2025 •

edited

Loading

DeaMariaLeon Apr 17, 2025 •

edited

Loading

DeaMariaLeon commented Apr 17, 2025 •

edited

Loading