Skip to content

MAINT make ClassifierChain test more efficient #28705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 27, 2024

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented Mar 27, 2024

 As suggested in #27662 (comment).

@ogrisel ogrisel added module:test-suite everything related to our tests No Changelog Needed labels Mar 27, 2024
@ogrisel
Copy link
Member Author

ogrisel commented Mar 27, 2024

For the sake of curiosity, I triggered the pypy CI to check whether that would fix the timeout problem before we get to merge #28704 to main.

Copy link

github-actions bot commented Mar 27, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: fe84886. Link to the linter CI: here

@lesteve
Copy link
Member

lesteve commented Mar 27, 2024

I doubt that this is a single issue but more a combination of multiple small issues. In particular there seems to be quite some variability in the slowest tests. In the last four logs where PyPy passed, none of them mentioned this particular test, see below.

Wild-guess: the fact that a test is slow does not necessarily indicate an issue in this particular test but it could be that memory usage is high (for example caused by another test running in parallel because we are using xdist or even previous tests) and the OS starts to swap and everything become slow.

============================= slowest 20 durations =============================
595.61s call     datasets/_lfw.py::sklearn.datasets._lfw.fetch_lfw_pairs
177.22s call     tests/test_multioutput.py::test_multi_target_regression
175.45s call     tests/test_multioutput.py::test_classifier_chain_vs_independent_models
72.16s call     tests/test_multioutput.py::test_base_chain_fit_and_predict_with_sparse_data_and_cv[csr_array]
49.70s call     tests/test_multioutput.py::test_base_chain_fit_and_predict_with_sparse_data_and_cv[csr_matrix]
41.11s call     utils/tests/test_estimator_checks.py::test_check_estimator_clones
40.39s call     datasets/_lfw.py::sklearn.datasets._lfw.fetch_lfw_people
39.32s call     decomposition/tests/test_dict_learning.py::test_dict_learning_online_numerical_consistency[lars]
35.22s call     preprocessing/tests/test_target_encoder.py::test_fit_transform_not_associated_with_y_if_ordinal_categorical_is_not[29]
30.50s call     decomposition/_dict_learning.py::sklearn.decomposition._dict_learning.dict_learning
28.77s call     decomposition/tests/test_dict_learning.py::test_cd_work_on_joblib_memmapped_data
28.01s call     feature_selection/tests/test_rfe.py::test_rfe_cv_groups
26.76s call     utils/tests/test_estimator_checks.py::test_check_estimator
26.20s call     tests/test_multioutput.py::test_classifier_chain_fit_and_predict[predict_log_proba-predict_log_proba]
20.63s call     tests/test_kernel_approximation.py::test_polynomial_count_sketch[0-3-5000-1]
18.78s call     ensemble/tests/test_stacking.py::test_stacking_classifier_multilabel_auto_predict[True-auto]
17.41s call     utils/tests/test_parallel.py::test_configuration_passes_through_to_joblib[multiprocessing-2]
17.01s call     ensemble/tests/test_stacking.py::test_stacking_classifier_multilabel_auto_predict[True-predict]
16.77s call     ensemble/tests/test_stacking.py::test_stacking_classifier_multilabel_auto_predict[False-predict]
16.53s call     tests/test_config.py::test_config_threadsafe_joblib[multiprocessing]
============================= slowest 20 durations =============================
244.85s call     tests/test_common.py::test_search_cv[HalvingGridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),min_resources='smallest',param_grid={'logisticregression__C':[0.1,1.0]},random_state=0)-check_dict_unchanged]
228.94s call     tests/test_common.py::test_search_cv[GridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_grid={'logisticregression__C':[0.1,1.0]})-check_pipeline_consistency]
206.38s call     utils/tests/test_extmath.py::test_incremental_weighted_mean_and_variance[1-1e-08-1e-08--10000000.0]
171.47s call     tests/test_common.py::test_search_cv[HalvingRandomSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_distributions={'logisticregression__C':[0.1,1.0]},random_state=0)-check_estimator_sparse_array]
166.41s call     tree/_classes.py::sklearn.tree._classes.DecisionTreeClassifier
158.86s call     tree/tests/test_tree.py::test_sparse_input[digits-DecisionTreeClassifier]
105.92s call     utils/tests/test_estimator_checks.py::test_check_estimator_clones
101.60s call     tests/test_multioutput.py::test_base_chain_fit_and_predict_with_sparse_data_and_cv[csr_array]
80.18s call     tests/test_random_projection.py::test_inverse_transform[28-False-SparseRandomProjection-2-2-coo_matrix]
73.72s call     tests/test_random_projection.py::test_random_projection_dtype_match[float64-float64-GaussianRandomProjection]
52.50s call     datasets/_lfw.py::sklearn.datasets._lfw.fetch_lfw_pairs
50.28s call     datasets/_lfw.py::sklearn.datasets._lfw.fetch_lfw_people
49.38s call     linear_model/tests/test_huber.py::test_huber_sample_weights[csr_matrix]
46.43s call     tests/test_multioutput.py::test_classifier_chain_fit_and_predict[predict_proba-decision_function]
42.10s call     decomposition/tests/test_dict_learning.py::test_dict_learning_online_numerical_consistency[lars]
41.02s call     preprocessing/tests/test_target_encoder.py::test_fit_transform_not_associated_with_y_if_ordinal_categorical_is_not[28]
40.13s call     tests/test_common.py::test_search_cv[GridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_grid={'logisticregression__C':[0.1,1.0]})-check_estimator_sparse_array]
39.87s call     decomposition/tests/test_sparse_pca.py::test_spca_n_components_[None-MiniBatchSparsePCA]
34.23s call     decomposition/tests/test_sparse_pca.py::test_spca_max_iter_None_deprecation
33.41s call     decomposition/_dict_learning.py::sklearn.decomposition._dict_learning.dict_learning
============================= slowest 20 durations =============================
212.77s call     tests/test_common.py::test_search_cv[GridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_grid={'logisticregression__C':[0.1,1.0]})-check_supervised_y_2d]
197.74s call     tests/test_common.py::test_search_cv[RandomizedSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_distributions={'logisticregression__C':[0.1,1.0]},random_state=0)-check_methods_subset_invariance]
77.04s call     utils/tests/test_estimator_checks.py::test_check_estimator
69.94s call     tests/test_common.py::test_search_cv[RandomizedSearchCV(cv=2,estimator=LogisticRegression(),param_distributions={'C':[0.1,1.0]},random_state=0)-check_classifiers_train]
37.06s call     datasets/_lfw.py::sklearn.datasets._lfw.fetch_lfw_pairs
33.16s call     utils/tests/test_estimator_checks.py::test_check_estimator_clones
30.80s call     preprocessing/tests/test_target_encoder.py::test_fit_transform_not_associated_with_y_if_ordinal_categorical_is_not[8]
28.30s call     datasets/_lfw.py::sklearn.datasets._lfw.fetch_lfw_people
26.63s call     tests/test_config.py::test_config_threadsafe_joblib[multiprocessing]
26.53s call     decomposition/tests/test_dict_learning.py::test_cd_work_on_joblib_memmapped_data
25.61s call     tests/test_multioutput.py::test_multi_target_sparse_regression[csr_array]
25.32s call     decomposition/tests/test_dict_learning.py::test_dict_learning_online_numerical_consistency[lars]
23.93s call     utils/tests/test_extmath.py::test_randomized_eigsh_reconst_low_rank[10-7]
20.50s call     feature_selection/tests/test_rfe.py::test_rfe_cv_groups
16.92s call     decomposition/_dict_learning.py::sklearn.decomposition._dict_learning.dict_learning
15.72s call     tree/tests/test_tree.py::test_min_weight_fraction_leaf_with_min_samples_leaf_on_dense_input[DecisionTreeRegressor]
14.05s call     tests/test_common.py::test_search_cv[GridSearchCV(cv=2,estimator=LogisticRegression(),param_grid={'C':[0.1,1.0]})-check_classifiers_one_label_sample_weights]
13.90s call     ensemble/tests/test_bagging.py::test_regression
13.79s call     tests/test_common.py::test_search_cv[HalvingGridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),min_resources='smallest',param_grid={'ridge__alpha':[0.1,1.0]},random_state=0)-check_estimators_dtypes]
13.47s call     ensemble/tests/test_stacking.py::test_stacking_classifier_multilabel_auto_predict[True-auto]
============================= slowest 20 durations =============================
358.09s call     tests/test_config.py::test_config_threadsafe_joblib[multiprocessing]
224.04s call     tests/test_random_projection.py::test_correct_RandomProjection_dimensions_embedding[0-coo_array]
215.13s call     tree/tests/test_tree.py::test_sparse_input[digits-DecisionTreeClassifier]
175.26s call     tree/_classes.py::sklearn.tree._classes.DecisionTreeClassifier
171.64s call     tests/test_random_projection.py::test_random_projection_dtype_match[int32-float64-SparseRandomProjection]
56.14s call     tests/test_kernel_approximation.py::test_polynomial_count_sketch[0-3-5000-1]
48.04s call     tests/test_random_projection.py::test_random_projection_numerical_consistency[SparseRandomProjection]
47.30s call     tests/test_random_projection.py::test_inverse_transform[0-False-SparseRandomProjection-1000-1000-coo_matrix]
45.57s call     utils/tests/test_estimator_checks.py::test_check_estimator_clones
41.98s call     tests/test_multioutput.py::test_base_chain_fit_and_predict_with_sparse_data_and_cv[csr_array]
36.68s call     datasets/_lfw.py::sklearn.datasets._lfw.fetch_lfw_pairs
34.31s call     tests/test_common.py::test_f_contiguous_array_estimator[TSNE]
34.28s call     preprocessing/tests/test_target_encoder.py::test_fit_transform_not_associated_with_y_if_ordinal_categorical_is_not[0]
30.91s call     datasets/_lfw.py::sklearn.datasets._lfw.fetch_lfw_people
27.36s call     tree/_classes.py::sklearn.tree._classes.ExtraTreeClassifier
26.65s call     decomposition/tests/test_dict_learning.py::test_cd_work_on_joblib_memmapped_data
25.55s call     feature_selection/tests/test_rfe.py::test_rfe_cv_groups
25.35s call     decomposition/tests/test_dict_learning.py::test_dict_learning_online_numerical_consistency[lars]
20.88s call     tree/tests/test_tree.py::test_sparse_input[multilabel-DecisionTreeRegressor]
20.53s call     utils/tests/test_estimator_checks.py::test_check_estimator

@ogrisel
Copy link
Member Author

ogrisel commented Mar 27, 2024

And the change in this PR broken the 32-bit linux tests:

    @pytest.mark.parametrize("order_type", [list, np.array, tuple])
    def test_classifier_chain_tuple_order(order_type):
        X = [[1, 2, 3], [4, 5, 6], [1.5, 2.5, 3.5]]
        y = [[3, 2], [2, 3], [3, 2]]
        order = order_type([1, 0])
    
        chain = ClassifierChain(RandomForestClassifier(n_estimators=2), order=order)
    
        chain.fit(X, y)
        X_test = [[1.5, 2.5, 3.5]]
        y_test = [[3, 2]]
>       assert_array_almost_equal(chain.predict(X_test), y_test)
E       AssertionError: 
E       Arrays are not almost equal to 6 decimals
E       
E       Mismatched elements: 1 / 2 (50%)
E       Max absolute difference: 1.
E       Max relative difference: 0.33333333
E        x: array([[2., 2.]])
E        y: array([[3, 2]])

not sure why though, as I would not have expected this test to be particularly sensitive to platform bitness...

@ogrisel
Copy link
Member Author

ogrisel commented Mar 27, 2024

Let's wait for the CI to complete for the sake of curiosity but I don't plan to invest more time on this. We can close without merging once the CI has ended.

@ogrisel
Copy link
Member Author

ogrisel commented Mar 27, 2024

not sure why though, as I would not have expected this test to be particularly sensitive to platform bitness...

Platform bitness is probably not the culprit but we do need to fix the random seed.

@ogrisel
Copy link
Member Author

ogrisel commented Mar 27, 2024

I pushed a commit to fix the random_state in the test. If CI is green this time, I think it's it's worth merging, irrespective of the (lack of) impact on the PyPy CI. It's just a small test optim and reliability improvement in the end.

@jeremiedbb
Copy link
Member

jeremiedbb commented Mar 27, 2024

from #28705 (comment)

datasets/_lfw.py::sklearn.datasets._lfw.fetch_lfw_pairs
datasets/_lfw.py::sklearn.datasets._lfw.fetch_lfw_people

The exemples recently added in the docstrings of fetchers have the effect that we now fetch the lfw datasets in all CI jobs.

EDIT: I opened a dedicated issue for that (#28707) since it impacts all CI jobs

Copy link
Member

@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

For completeness, I suspect this does not significantly improve the situation for PyPy.

@lesteve lesteve enabled auto-merge (squash) March 27, 2024 13:16
@lesteve
Copy link
Member

lesteve commented Mar 27, 2024

I set auto-merge and pushed a commit to trigger the CI since the PyPy build timed out.

@lesteve lesteve merged commit 52138e2 into scikit-learn:main Mar 27, 2024
@ogrisel ogrisel deleted the classifier-chain-test-optim branch March 27, 2024 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:test-suite everything related to our tests No Changelog Needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants