-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Improve tests by using global_random_seed fixture to make them less seed-sensitive #22827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I labeled this issue as hard, as I expect some tests to be hard to upgrade. However I expect the majority of the tests to be easy to convert but we cannot know which will be easy in advance. |
…seed on test cases
I updated the original message with the feature introduced in #23026 where we can run the CI with all the seeds by pushing a commit with the proper message. @ogrisel Feel free to update the wording if you find it unclear.
Note, running |
working on sklearn/linear_model/tests/test_bayes.py |
Working on |
Working on |
To anyone interested in contributing, remember to add an empty commit to your PR (in accordance to this issue):
|
Working on |
Working on |
Working on |
Commenting here as a friendly reminder @glemaitre
|
Thanks @DeaMariaLeon for the reminder. However, briefly looking at |
Sorry for wasting your time. I was looking at the wrong file. 🫣 |
Per Guillaume, commenting here that I opened a PR for |
Working on edit: These tests need to be removed from the list - the PRs have already been merged: sklearn/decomposition/tests/test_truncated_svd.py #30922 |
I think that |
I worked on |
I'll work on |
Context: the new
global_random_seed
fixture#22749 introduces a new
global_random_seed
fixture to make it possible to run the same test with any seed between 0 and 99 included. By default, whenSKLEARN_TESTS_GLOBAL_RANDOM_SEED
is not set, this fixture is deterministically returning 42 to keep test runs deterministic by default and avoid any unnecessary disruption. However different CI builds set this seed to other arbitrary values (still deterministic) and nightly schedule builds on Azure now useSKLEARN_TESTS_GLOBAL_RANDOM_SEED="any"
to progressively explore any seed on the 0-99 range.Motivation
The aim of this new fixture is to make sure that we avoid writing tests that artificially depend on a specific value of the random seed and therefore hiding a real mathematical problem in our code unknowingly (see e.g. #21701 (comment)). At the same time we still want to keep the test deterministic and independent of the execution order by default to avoid introducing unnecessary maintenance overhead.
In addition to making the tests insensitive, randomizing those tests with different seeds has the side benefit of making the assertions of those tests robust to small numerical variations that could otherwise stem from other sources such as platform-specific / dependency-specific numerical rounding variations that we do not cover in our existing CI infrastructure.
More details about the fixture in the online dev doc for the
SKLEARN_TESTS_GLOBAL_RANDOM_SEED
env variable:https://scikit-learn.org/dev/computing/parallelism.html#environment-variables
Guidelines to convert existing tests
We probably do not need to convert all scikit-learn tests to use this fixture. We should instead focus our efforts on tests that actually check for important mathematical properties of our estimators or model evaluation tools. For instance, there is no need to check for the seed-insensitivity of tests that checks for the exception messages raised when passing invalid inputs.
To avoid having to review huge PRs that impact many files at once and can lead to conflicts, let's open PRs that edit at most one test file at a time. For instance use a title such as:
Please reference
#22827
in the description of the PR and put the full filename of the test file you edit in the title of the PR.To convert an existing test with a fixed seed, the general pattern is to rewrite a function such as:
to:
and then check that the test function is actually seed-insensitive by running with all seeds between 0 and 99 locally (can be slow! only run for one specific test at a time!):
If this is not the case, the test will probably need to be reworked to find a more stable to way to check the interesting mathematical properties.
if the failing assertions are related to the generalization performance of a model, maybe the training set size should be slightly bigger (while keeping the test runtime as fast as possible), or with fewer noisy features or the training should be done with stronger regularization. Or more simply we can relax the tolerance threshold while ensuring it does not become trivial (e.g. by comparing to a trivial baseline);
if the failing assertions depend on some regularities of a synthetically generated dataset, making decreasing the noise level of the datasets;
some tests might also fail when encountering data that trigger edge cases such as (near-)tied distances between datapoints that make the outcome of computation unstable. Changing the data generation code to significantly decrease the likelihood of those edge case (e.g. by adding more noise to the input features) can help in those cases.
Note: in most cases, tweaking the tolerances of the assertions is not the appropriate way to make the tests pass. The first thing to do is try to understand what the test is checking, if the test is correct, if the expectations of the test are realistic. Then if the test seems correct and should pass for all random seed but doesn't, investigate if the estimator or function is bugged. As a last resort, tolerances can be loosened if the test is considered valid but aims to check a statistical property that is highly sensitive to the random seed.
In some cases, it might be very hard to write a seed-insensitive test that tolerate all seeds between 0 and 99 while still running in less than 1s. In those (hopefully rare) cases, I think it's fine to reduce the range of admissible seeds with the following pattern:
global_random_seed
by pushing a commit message with the following structure:Note, running
git commit --allow-empty
allows you to have a commit message without any changes.See the following issue for more details on why testing on the CI is necessary:
List of test modules to upgrade
sklearn/metrics/tests/test_pairwise_distances_reduction.py
is seed insensitive #22862Note that some of those files might not have any test to update.
The text was updated successfully, but these errors were encountered: