TST Speed up some of the slowest tests #27383

ogrisel · 2023-09-15T14:12:43Z

Mostly by reducing the dataset sizes.

slowest tests for pylatest_conda_forge_mkl on main:

============================= slowest 20 durations =============================
31.32s call     decomposition/_dict_learning.py::sklearn.decomposition._dict_learning.DictionaryLearning
26.73s call     ensemble/tests/test_gradient_boosting.py::test_classification_synthetic[5-log_loss]
25.88s call     neighbors/tests/test_neighbors.py::test_knn_forcing_backend[kd_tree-multiprocessing]
25.35s call     neighbors/tests/test_neighbors.py::test_knn_forcing_backend[auto-multiprocessing]
23.85s call     ensemble/tests/test_gradient_boosting.py::test_classification_synthetic[5-exponential]
22.09s call     neighbors/tests/test_neighbors.py::test_knn_forcing_backend[ball_tree-multiprocessing]
18.64s call     tree/tests/test_tree.py::test_min_impurity_decrease
17.27s call     decomposition/tests/test_sparse_pca.py::test_mini_batch_correct_shapes
14.74s call     experimental/tests/test_enable_successive_halving.py::test_imports_strategies
14.27s call     utils/tests/test_estimator_checks.py::test_check_estimator_clones
13.12s call     experimental/tests/test_enable_iterative_imputer.py::test_imports_strategies
12.71s call     decomposition/tests/test_dict_learning.py::test_cd_work_on_joblib_memmapped_data
11.94s call     utils/tests/test_parallel.py::test_dispatch_config_parallel[2]
11.44s call     feature_selection/tests/test_sequential.py::test_n_features_to_select_stopping_criterion[forward]
11.35s call     ensemble/tests/test_gradient_boosting.py::test_gradient_boosting_validation_fraction
11.28s call     decomposition/tests/test_sparse_pca.py::test_transform_inverse_transform_round_trip[MiniBatchSparsePCA]
10.55s call     decomposition/tests/test_dict_learning.py::test_sparse_encode_shapes_omp
10.18s call     feature_selection/tests/test_rfe.py::test_rfe_cv_groups
9.86s call     feature_extraction/image.py::sklearn.feature_extraction.image.PatchExtractor
9.83s call     preprocessing/tests/test_target_encoder.py::test_fit_transform_not_associated_with_y_if_ordinal_categorical_is_not[5]

same report on this PR:

============================= slowest 20 durations =============================
13.73s call     utils/tests/test_parallel.py::test_dispatch_config_parallel[2]
11.97s call     experimental/tests/test_enable_successive_halving.py::test_imports_strategies
11.21s call     decomposition/tests/test_dict_learning.py::test_cd_work_on_joblib_memmapped_data
11.06s call     model_selection/tests/test_split.py::test_nested_cv
10.62s call     decomposition/tests/test_sparse_pca.py::test_transform_inverse_transform_round_trip[MiniBatchSparsePCA]
10.44s call     decomposition/tests/test_dict_learning.py::test_dict_learning_lassocd_readonly_data
10.43s call     feature_extraction/image.py::sklearn.feature_extraction.image.PatchExtractor
10.37s call     feature_selection/tests/test_sequential.py::test_n_features_to_select_stopping_criterion[forward]
10.32s call     experimental/tests/test_enable_iterative_imputer.py::test_imports_strategies
9.76s call     decomposition/tests/test_dict_learning.py::test_sparse_encode_shapes_omp
9.11s call     metrics/tests/test_pairwise.py::test_sparse_manhattan_readonly_dataset
9.04s call     linear_model/tests/test_sgd.py::test_multi_core_gridsearch_and_early_stopping
8.89s call     ensemble/tests/test_gradient_boosting.py::test_gradient_boosting_validation_fraction
8.79s call     utils/tests/test_estimator_checks.py::test_check_estimator_clones
8.73s call     inspection/tests/test_partial_dependence.py::test_partial_dependence_non_null_weight_idx[1-estimator2]
8.69s call     preprocessing/tests/test_target_encoder.py::test_fit_transform_not_associated_with_y_if_ordinal_categorical_is_not[5]
8.67s call     inspection/tests/test_partial_dependence.py::test_partial_dependence_non_null_weight_idx[0-estimator2]
8.55s call     model_selection/tests/test_search.py::test_random_search_bad_cv
8.34s call     manifold/tests/test_t_sne.py::test_tsne_with_mahalanobis_distance
8.10s call     inspection/tests/test_partial_dependence.py::test_partial_dependence_equivalence_equal_sample_weight[LogisticRegression-data1]

github-actions · 2023-09-15T14:14:56Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 8cb8ff3. Link to the linter CI: here}

…the slowest multiprocessing backend

…terations

…y using n_jobs=2 instead of n_jobs=3

adrinjalali · 2023-09-15T16:03:05Z

sklearn/neighbors/tests/test_neighbors.py

@@ -2044,10 +2043,10 @@ def test_same_radius_neighbors_parallel(algorithm):
    assert_allclose(graph, graph_parallel)


-@pytest.mark.parametrize("backend", JOBLIB_BACKENDS)
+@pytest.mark.parametrize("backend", ["threading", "loky"])


we don't want to test multiprocessing and sequential backends?

The multiprocessing case is very slow on macOS (and maybe windows as well): this backend creates new processes from scratch each time which is can take several tens of seconds to run on the overloaded macOS Azure CI.

I think the loky backend is enough to test for multiprocess-related problems in our Cython code. The main difference between multiprocessing and loky is that loky reuses its workers across calls making it much more efficient on overloaded machines.

Also sequential is already tested everywhere else in test_neighbors.py (as it's the default backend for the default value of n_jobs).

adrinjalali · 2023-09-15T16:12:42Z

sklearn/decomposition/_dict_learning.py

-    ...     n_samples=100, n_components=15, n_features=20, n_nonzero_coefs=10,
+    ...     n_samples=30, n_components=15, n_features=20, n_nonzero_coefs=10,


curious, how much speedup are we getting here? are we THAT slow? 😁

On the macOS CI build, this test is sometimes one of the slowest and can last more than 20s on an overloaded CI runner.

Locally it's fast but decreasing this makes it ~3x faster.

I updated the description of the PR to show the impact on the macOS CI between main and this PR.

I observed in the latest run than the docstring of MiniBatchDictionaryLearning has a similar problem (and a bug in the direction on the assertion about the sparsity of the transformed data). I pushed a96dee2 to fix it and improve the speed. We will have to wait for the end of this CI run to see the concrete impact of that last commit.

I updated the description to take a96dee2 into account.

sklearn/ensemble/tests/test_gradient_boosting.py

glemaitre

Up to my comments, I am fine with the changes.

I assume that all the merging regarding the sparse containers hurts the testing performance in terms of time. I tried to be sure that we were not duplicating tests for no reason while merging.

I am wondering if we could have been slightly more lenient with the test: be sure that fit/predict/transform works but not always running statistical checking for all tests.

sklearn/ensemble/tests/test_gradient_boosting.py

ogrisel · 2023-09-15T21:35:19Z

I am wondering if we could have been slightly more lenient with the test: be sure that fit/predict/transform works but not always running statistical checking for all tests.

In the long term, we will remove the tests for scipy sparse matrices and only keep the scipy sparse arrays.

In the short term, we could maybe change the defintion of the *_CONTAINERS lists to:

only include scipy sparse matrices for versions of scipy < 1.8 (already the case)
only include scipy sparse arrays for recent versions of scipy
run both only if a specific environment variable is set and we would enable this on one of the fast runners with a recent version of scipy only.

lorentzenchr

LGTM

lorentzenchr · 2023-09-15T22:09:08Z

@ogrisel ready to merge or anything else?

ogrisel · 2023-09-16T09:11:57Z

Merged. There are more tests to optimize down the line but with diminishing returns.

Another PR that should greatly speed-up the tests is the test refactoring of pairwise argkmin & radius reductions: #27281.

ogrisel added 3 commits September 15, 2023 15:21

Accelerate test_classification_synthetic by leveraging a smaller dataset

ae92dfc

Accelerate test_min_impurity_decrease by leveraging a smaller dataset

eccf01d

Accelerate DictLearning doctest by using a smaller dataset

c731f7e

ogrisel added Build / CI module:test-suite everything related to our tests No Changelog Needed labels Sep 15, 2023

github-actions bot added module:decomposition module:ensemble module:tree labels Sep 15, 2023

ogrisel added the Quick Review For PRs that are quick to review label Sep 15, 2023

ogrisel added 3 commits September 15, 2023 17:24

Accelerate test_knn_forcing_backend on macOS and Windows by skipping …

ed81754

…the slowest multiprocessing backend

Accelerate test_mini_batch_correct_shapes by reducing the number of i…

44a3a52

…terations

Limit oversubscription on macOS CI in test_sparse_encode_shapes_omp b…

700f109

…y using n_jobs=2 instead of n_jobs=3

adrinjalali reviewed Sep 15, 2023

View reviewed changes

glemaitre reviewed Sep 15, 2023

View reviewed changes

sklearn/ensemble/tests/test_gradient_boosting.py Show resolved Hide resolved

glemaitre approved these changes Sep 15, 2023

View reviewed changes

ogrisel commented Sep 15, 2023

View reviewed changes

sklearn/ensemble/tests/test_gradient_boosting.py Show resolved Hide resolved

ogrisel added 2 commits September 15, 2023 23:25

Accelerate (and fix) the doctest of MiniBatchDictionaryLearning

a96dee2

Add comment about original experiment of ESLII.

8cb8ff3

ogrisel removed the Quick Review For PRs that are quick to review label Sep 15, 2023

lorentzenchr approved these changes Sep 15, 2023

View reviewed changes

adrinjalali approved these changes Sep 16, 2023

View reviewed changes

ogrisel merged commit 1fbf5fb into scikit-learn:main Sep 16, 2023

ogrisel deleted the speed-up-slow-tests branch September 16, 2023 09:09

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Sep 18, 2023

TST Speed up some of the slowest tests (scikit-learn#27383)

2a06ca7

jeremiedbb pushed a commit that referenced this pull request Sep 20, 2023

TST Speed up some of the slowest tests (#27383)

622809d

REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023

TST Speed up some of the slowest tests (scikit-learn#27383)

5a53f1d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TST Speed up some of the slowest tests #27383

TST Speed up some of the slowest tests #27383

Uh oh!

ogrisel commented Sep 15, 2023 •

edited

Loading

Uh oh!

github-actions bot commented Sep 15, 2023 •

edited

Loading

Uh oh!

adrinjalali Sep 15, 2023

Uh oh!

ogrisel Sep 15, 2023

Uh oh!

ogrisel Sep 15, 2023

Uh oh!

adrinjalali Sep 15, 2023

Uh oh!

ogrisel Sep 15, 2023

Uh oh!

ogrisel Sep 15, 2023

Uh oh!

ogrisel Sep 16, 2023

Uh oh!

Uh oh!

glemaitre left a comment

Uh oh!

Uh oh!

ogrisel commented Sep 15, 2023

Uh oh!

lorentzenchr left a comment

Uh oh!

lorentzenchr commented Sep 15, 2023

Uh oh!

ogrisel commented Sep 16, 2023

Uh oh!

Uh oh!

		... n_samples=100, n_components=15, n_features=20, n_nonzero_coefs=10,
		... n_samples=30, n_components=15, n_features=20, n_nonzero_coefs=10,

Uh oh!

TST Speed up some of the slowest tests #27383

TST Speed up some of the slowest tests #27383

Uh oh!

Conversation

ogrisel commented Sep 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

adrinjalali Sep 15, 2023

Choose a reason for hiding this comment

Uh oh!

ogrisel Sep 15, 2023

Choose a reason for hiding this comment

Uh oh!

ogrisel Sep 15, 2023

Choose a reason for hiding this comment

Uh oh!

adrinjalali Sep 15, 2023

Choose a reason for hiding this comment

Uh oh!

ogrisel Sep 15, 2023

Choose a reason for hiding this comment

Uh oh!

ogrisel Sep 15, 2023

Choose a reason for hiding this comment

Uh oh!

ogrisel Sep 16, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel commented Sep 15, 2023

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

lorentzenchr commented Sep 15, 2023

Uh oh!

ogrisel commented Sep 16, 2023

Uh oh!

Uh oh!

ogrisel commented Sep 15, 2023 •

edited

Loading

github-actions bot commented Sep 15, 2023 •

edited

Loading