Skip to content

MAINT Remove some unwanted side effects in our test suite #29584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Oct 21, 2024

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented Jul 30, 2024

I have observed that my local pytest runs test in a random order locally with recent Python versions and I suspect that this is causing a few failures that are not reproduced on the CI.

Let's try to reproduce this on the CI with a few runs with different seeds in this draft PR with the pytest-random-order plugin.

EDIT: removing side-effects is also useful for running the tests in parallel with threads, e.g. for #30007.

Copy link

github-actions bot commented Jul 30, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: dd4ca3c. Link to the linter CI: here

@ogrisel
Copy link
Member Author

ogrisel commented Jul 30, 2024

So indeed this does trigger failures that look similar to what I observe locally with my non-voluntarily randomized pytest command:

FAILED gaussian_process/tests/test_kernels.py::test_kernel_gradient[kernel17] - AssertionError: 
FAILED metrics/_plot/tests/test_roc_curve_display.py::test_roc_curve_display_complex_pipeline[from_estimator-clf1] - Failed: DID NOT RAISE <class 'sklearn.exceptions.NotFittedError'>
FAILED tests/test_pipeline.py::test_set_feature_union_passthrough - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_fit_predict_on_pipeline - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_pipeline_memory - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_feature_union_weights - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_feature_union[csr_array] - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_feature_union[csr_matrix] - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_pipeline_score_samples_pca_lof - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_pipeline_methods_preprocessing_svm - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_pipeline_transform - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_pipeline_methods_pca_svm - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_pipeline_methods_anova - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_classes_property - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_feature_union_passthrough_get_feature_names_out_false - ValueError: Input X contains NaN.
= 15 failed, 33250 passed, 2527 skipped, 92 xfailed, 52 xpassed, 6874 warnings in 1758.33s (0:29:18) =

full log.

Let me re-trigger the CI with a different ordering to check if the list of failures is stable or not.

@ogrisel
Copy link
Member Author

ogrisel commented Jul 30, 2024

Here are the failing tests for the second random ordering of the tests:

FAILED tests/test_pipeline.py::test_classes_property - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_pipeline_memory - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_pipeline_methods_preprocessing_svm - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_feature_union_passthrough_get_feature_names_out_true - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_feature_union_passthrough_get_feature_names_out_false - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_fit_predict_on_pipeline - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_pipeline_methods_anova - ValueError: Input X contains NaN.
FAILED tests/test_pipeline.py::test_set_feature_union_passthrough - ValueError: Input X contains NaN.
FAILED gaussian_process/tests/test_kernels.py::test_kernel_gradient[kernel16] - AssertionError: 
FAILED tests/test_kernel_approximation.py::test_additive_chi2_sampler[csr_array] - ValueError: Negative values in data passed to X in AdditiveChi2Sampler.tran...
FAILED tests/test_kernel_approximation.py::test_additive_chi2_sampler[csr_matrix] - ValueError: Negative values in data passed to X in AdditiveChi2Sampler.tran...
FAILED tree/tests/test_tree.py::test_regression_tree_missing_values_toy[squared_error-X2-ExtraTreeRegressor] - AssertionError
= 12 failed, 33253 passed, 2527 skipped, 92 xfailed, 52 xpassed, 6863 warnings in 1283.62s (0:21:23) =

So indeed the list of failures depend on the execution order of the tests which reveals unintended side effects in our test suite.

@ogrisel
Copy link
Member Author

ogrisel commented Jul 30, 2024

Those test files share in common the fact that they define a test dataset at important time (as test module attributes) and then reuse them multiple times while sometimes modifying them inplace (as is the case in test_feature_union and test_pipeline_missing_values_leniency).

I did a minimal fix for the test_pipeline.py file in a7a15bf to confirm. However we might want a cleaner fix, e.g. using fixtures or loading iris in each file instead of sharing module level numpy arrays.

@ogrisel
Copy link
Member Author

ogrisel commented Aug 20, 2024

All tests pass locally for me now. I cannot reproduce the last failure:

_ test_regression_tree_missing_values_toy[squared_error-X3-ExtraTreeRegressor] _
[gw0] darwin -- Python 3.12.5 /usr/local/miniconda/envs/testvenv/bin/python

Tree = <class 'sklearn.tree._classes.ExtraTreeRegressor'>
X = array([[ 1.],
       [ 2.],
       [ 3.],
       [nan],
       [ 6.],
       [nan]])
criterion = 'squared_error'

    @pytest.mark.parametrize("Tree", [DecisionTreeRegressor, ExtraTreeRegressor])
    @pytest.mark.parametrize(
        "X",
        [
            # missing values will go left for greedy splits
            np.array([np.nan, 2, np.nan, 4, 5, 6]),
            np.array([np.nan, np.nan, 3, 4, 5, 6]),
            # missing values will go right for greedy splits
criterion  = 'squared_error'
tree       = ExtraTreeRegressor(random_state=0)
tree_ref   = ExtraTreeRegressor(random_state=0)
y          = array([0, 1, 2, 3, 4, 5])

I really do not understand what causes this and why it would be related to test ordering. This test creates new estimator instances each time and the data should not be modified.

@ogrisel
Copy link
Member Author

ogrisel commented Aug 20, 2024

With the extra debugging info in the assertion we get both for the Linux_Runs pylatest_conda_forge_mkl, the macOS pylatest_conda_forge_mkl and the Windows pymin_conda_forge_mkl builds

        impurity = tree.tree_.impurity
>       assert all(impurity >= 0), impurity.min()  # MSE should always be positive
E       AssertionError: -12.25

So this is definitely above the rounding error level...

Let me try to push a new build with a different ordering seed.

@ogrisel ogrisel added the module:test-suite everything related to our tests label Oct 10, 2024
Copy link
Member Author

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, the CI no longer reproduces the last reported unsolved failure.

Let's remove the test ordering shuffling to be able to merge the fixes for the unwanted test side effects.

@ogrisel ogrisel changed the title DEBUG TESTS: check if test re-ordering causes failures on CI Remove some unwanted side effects in our test suite Oct 10, 2024
@ogrisel ogrisel marked this pull request as ready for review October 10, 2024 08:56
@ogrisel
Copy link
Member Author

ogrisel commented Oct 10, 2024

Assuming CI stays green after the last commit, this should be ready for review @glemaitre and @lesteve.

@ogrisel ogrisel added the Quick Review For PRs that are quick to review label Oct 10, 2024
@glemaitre glemaitre changed the title Remove some unwanted side effects in our test suite MAINT Remove some unwanted side effects in our test suite Oct 11, 2024
@glemaitre glemaitre self-requested a review October 11, 2024 09:18
@lesteve
Copy link
Member

lesteve commented Oct 14, 2024

In case it is ever needed in the future, this is the kind of thing that was used in the CI to reproduce the failures (reverted in dd4ca3c):

python -m pip install pytest-random-order

python -m pytest --random-order-seed=1 sklearn

@lesteve lesteve merged commit bc8eb66 into scikit-learn:main Oct 21, 2024
40 checks passed
@lesteve
Copy link
Member

lesteve commented Oct 21, 2024

Merging, thanks!

@ogrisel ogrisel deleted the test-side-effects branch November 26, 2024 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build / CI module:test-suite everything related to our tests No Changelog Needed Quick Review For PRs that are quick to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants