MAINT Remove some unwanted side effects in our test suite #29584

ogrisel · 2024-07-30T10:18:44Z

I have observed that my local pytest runs test in a random order locally with recent Python versions and I suspect that this is causing a few failures that are not reproduced on the CI.

Let's try to reproduce this on the CI with a few runs with different seeds in this draft PR with the pytest-random-order plugin.

EDIT: removing side-effects is also useful for running the tests in parallel with threads, e.g. for #30007.

github-actions · 2024-07-30T10:20:12Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: dd4ca3c. Link to the linter CI: here}

ogrisel · 2024-07-30T12:46:42Z

So indeed this does trigger failures that look similar to what I observe locally with my non-voluntarily randomized pytest command:

FAILED gaussian_process/tests/test_kernels.py::test_kernel_gradient[kernel17] - AssertionError: 
FAILED metrics/_plot/tests/test_roc_curve_display.py::test_roc_curve_display_complex_pipeline[from_estimator-clf1] - Failed: DID NOT RAISE <class 'sklearn.exceptions.NotFittedError'>
FAILED tests/test_pipeline.py::test_set_feature_union_passthrough - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_fit_predict_on_pipeline - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_pipeline_memory - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_feature_union_weights - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_feature_union[csr_array] - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_feature_union[csr_matrix] - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_pipeline_score_samples_pca_lof - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_pipeline_methods_preprocessing_svm - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_pipeline_transform - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_pipeline_methods_pca_svm - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_pipeline_methods_anova - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_classes_property - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_feature_union_passthrough_get_feature_names_out_false - ValueError: Input X contains NaN.
= 15 failed, 33250 passed, 2527 skipped, 92 xfailed, 52 xpassed, 6874 warnings in 1758.33s (0:29:18) =

full log.

Let me re-trigger the CI with a different ordering to check if the list of failures is stable or not.

ogrisel · 2024-07-30T14:52:30Z

Here are the failing tests for the second random ordering of the tests:

FAILED tests/test_pipeline.py::test_classes_property - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_pipeline_memory - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_pipeline_methods_preprocessing_svm - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_feature_union_passthrough_get_feature_names_out_true - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_feature_union_passthrough_get_feature_names_out_false - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_fit_predict_on_pipeline - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_pipeline_methods_anova - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_set_feature_union_passthrough - ValueError: Input X contains NaN.
FAILED gaussian_process/tests/test_kernels.py::test_kernel_gradient[kernel16] - AssertionError: 
FAILED tests/test_kernel_approximation.py::test_additive_chi2_sampler[csr_array] - ValueError: Negative values in data passed to X in AdditiveChi2Sampler.tran...
FAILED tests/test_kernel_approximation.py::test_additive_chi2_sampler[csr_matrix] - ValueError: Negative values in data passed to X in AdditiveChi2Sampler.tran...
FAILED tree/tests/test_tree.py::test_regression_tree_missing_values_toy[squared_error-X2-ExtraTreeRegressor] - AssertionError
= 12 failed, 33253 passed, 2527 skipped, 92 xfailed, 52 xpassed, 6863 warnings in 1283.62s (0:21:23) =

So indeed the list of failures depend on the execution order of the tests which reveals unintended side effects in our test suite.

ogrisel · 2024-07-30T15:06:53Z

Those test files share in common the fact that they define a test dataset at important time (as test module attributes) and then reuse them multiple times while sometimes modifying them inplace (as is the case in test_feature_union and test_pipeline_missing_values_leniency).

I did a minimal fix for the test_pipeline.py file in a7a15bf to confirm. However we might want a cleaner fix, e.g. using fixtures or loading iris in each file instead of sharing module level numpy arrays.

…of one another

…el_approximation

…ests.test_roc_curve_display

….test_nca

…reeRegressor] [azure parallel]

ogrisel · 2024-08-20T15:22:16Z

All tests pass locally for me now. I cannot reproduce the last failure:

_ test_regression_tree_missing_values_toy[squared_error-X3-ExtraTreeRegressor] _
[gw0] darwin -- Python 3.12.5 /usr/local/miniconda/envs/testvenv/bin/python

Tree = <class 'sklearn.tree._classes.ExtraTreeRegressor'>
X = array([[ 1.],
       [ 2.],
       [ 3.],
       [nan],
       [ 6.],
       [nan]])
criterion = 'squared_error'

    @pytest.mark.parametrize("Tree", [DecisionTreeRegressor, ExtraTreeRegressor])
    @pytest.mark.parametrize(
        "X",
        [
            # missing values will go left for greedy splits
            np.array([np.nan, 2, np.nan, 4, 5, 6]),
            np.array([np.nan, np.nan, 3, 4, 5, 6]),
            # missing values will go right for greedy splits
criterion  = 'squared_error'
tree       = ExtraTreeRegressor(random_state=0)
tree_ref   = ExtraTreeRegressor(random_state=0)
y          = array([0, 1, 2, 3, 4, 5])

I really do not understand what causes this and why it would be related to test ordering. This test creates new estimator instances each time and the data should not be modified.

ogrisel · 2024-08-20T15:26:48Z

With the extra debugging info in the assertion we get both for the Linux_Runs pylatest_conda_forge_mkl, the macOS pylatest_conda_forge_mkl and the Windows pymin_conda_forge_mkl builds

        impurity = tree.tree_.impurity
>       assert all(impurity >= 0), impurity.min()  # MSE should always be positive
E       AssertionError: -12.25

So this is definitely above the rounding error level...

Let me try to push a new build with a different ordering seed.

…_error-X3-ExtraTreeRegressor] with a different random order seed [azure parallel]

ogrisel

For some reason, the CI no longer reproduces the last reported unsolved failure.

Let's remove the test ordering shuffling to be able to merge the fixes for the unwanted test side effects.

build_tools/azure/test_script.sh

ogrisel · 2024-10-10T08:58:13Z

Assuming CI stays green after the last commit, this should be ready for review @glemaitre and @lesteve.

lesteve · 2024-10-14T14:50:55Z

In case it is ever needed in the future, this is the kind of thing that was used in the CI to reproduce the failures (reverted in dd4ca3c):

python -m pip install pytest-random-order

python -m pytest --random-order-seed=1 sklearn

lesteve · 2024-10-21T08:17:11Z

Merging, thanks!

DEBUG TESTS: check if test re-ordering causes failures on CI

4924b5a

github-actions bot added the Build / CI label Jul 30, 2024

DEBUG: trigger tests with --random-order-seed=1

b5859b8

Minmal fix to remove side effects in sklearn/tests/test_pipeline.py

a7a15bf

ogrisel added 4 commits August 8, 2024 15:52

Merge branch 'main' into test-side-effects

9f854a3

Make tests in sklearn.gaussian_process.tests.test_kernel independent …

8f2ec5a

…of one another

Trigger CI

656ad8e

Remove dependencies between test functions in sklearn.tests.test_kern…

b920d19

…el_approximation

ogrisel added the No Changelog Needed label Aug 20, 2024

ogrisel added 3 commits August 20, 2024 15:47

Remove dependencies between test functions in sklearn.metrics._plot.t…

89b4d00

…ests.test_roc_curve_display

Remove dependencies between test functions in sklearn.neighbors.tests…

91d8ae3

….test_nca

debug test_regression_tree_missing_values_toy[squared_error-X3-ExtraT…

4131261

…reeRegressor] [azure parallel]

ogrisel added 2 commits August 20, 2024 17:43

another debug run for test_regression_tree_missing_values_toy[squared…

8258532

…_error-X3-ExtraTreeRegressor] with a different random order seed [azure parallel]

Check if pytest-xdist failure is random [azure parallel]

9d0d797

ogrisel mentioned this pull request Oct 9, 2024

MAINT remove side-effects in test_partial_dependence #30039

Merged

ogrisel added 3 commits October 9, 2024 18:42

Merge branch 'main' into test-side-effects

34f691c

[azure parallel] test again with --random-order-seed=1

0ceaa0c

Merge branch 'main' into test-side-effects

001c214

ogrisel added the module:test-suite everything related to our tests label Oct 10, 2024

ogrisel commented Oct 10, 2024

View reviewed changes

build_tools/azure/test_script.sh Outdated Show resolved Hide resolved

build_tools/azure/test_script.sh Outdated Show resolved Hide resolved

Remove test order randomization [azure parallel]

dd4ca3c

ogrisel changed the title ~~DEBUG TESTS: check if test re-ordering causes failures on CI~~ Remove some unwanted side effects in our test suite Oct 10, 2024

ogrisel marked this pull request as ready for review October 10, 2024 08:56

ogrisel added the Quick Review For PRs that are quick to review label Oct 10, 2024

glemaitre changed the title ~~Remove some unwanted side effects in our test suite~~ MAINT Remove some unwanted side effects in our test suite Oct 11, 2024

glemaitre self-requested a review October 11, 2024 09:18

lesteve approved these changes Oct 14, 2024

View reviewed changes

lesteve merged commit bc8eb66 into scikit-learn:main Oct 21, 2024
40 checks passed

ogrisel mentioned this pull request Nov 25, 2024

FIX Fix ExtraTreeRegressor missing data handling #30318

Merged

ogrisel deleted the test-side-effects branch November 26, 2024 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT Remove some unwanted side effects in our test suite #29584

MAINT Remove some unwanted side effects in our test suite #29584

ogrisel commented Jul 30, 2024 •

edited

Loading

github-actions bot commented Jul 30, 2024 •

edited

Loading

ogrisel commented Jul 30, 2024 •

edited

Loading

ogrisel commented Jul 30, 2024

ogrisel commented Jul 30, 2024

ogrisel commented Aug 20, 2024

ogrisel commented Aug 20, 2024 •

edited

Loading

ogrisel left a comment •

edited

Loading

ogrisel commented Oct 10, 2024

lesteve commented Oct 14, 2024 •

edited

Loading

lesteve commented Oct 21, 2024

MAINT Remove some unwanted side effects in our test suite #29584

MAINT Remove some unwanted side effects in our test suite #29584

Conversation

ogrisel commented Jul 30, 2024 • edited Loading

github-actions bot commented Jul 30, 2024 • edited Loading

✔️ Linting Passed

ogrisel commented Jul 30, 2024 • edited Loading

ogrisel commented Jul 30, 2024

ogrisel commented Jul 30, 2024

ogrisel commented Aug 20, 2024

ogrisel commented Aug 20, 2024 • edited Loading

ogrisel left a comment • edited Loading

Choose a reason for hiding this comment

ogrisel commented Oct 10, 2024

lesteve commented Oct 14, 2024 • edited Loading

lesteve commented Oct 21, 2024

ogrisel commented Jul 30, 2024 •

edited

Loading

github-actions bot commented Jul 30, 2024 •

edited

Loading

ogrisel commented Jul 30, 2024 •

edited

Loading

ogrisel commented Aug 20, 2024 •

edited

Loading

ogrisel left a comment •

edited

Loading

lesteve commented Oct 14, 2024 •

edited

Loading