BLD Use Cython's shared memoryview utility to reduce wheel size #31151

thomasjpfan · 2025-04-05T16:19:04Z

Reference Issues/PRs

Towards #27767

What does this implement/fix? Explain your changes.

This PR updates meson.build to use Cython 3.1's --generate-shared= functionality.

Any other comments?

When testing this locally on MacOS, I get these sizes for scikit_learn-1.7.dev0-cp312-cp312-macosx_15_0_arm64.whl

This PR: 8.6 MB
main: 15.2 MB

We can see that the wheels from this PR is smaller by ~ 25%

The wheels from this PR: https://github.com/scikit-learn/scikit-learn/actions/runs/14283757544?pr=31151
The wheels from on main: https://github.com/scikit-learn/scikit-learn/actions/runs/14278107805

pyproject.toml

github-actions · 2025-04-05T16:21:21Z

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here

`ruff format`

ruff detected issues. Please run ruff format locally and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.0.


--- benchmarks/bench_hist_gradient_boosting_adult.py
+++ benchmarks/bench_hist_gradient_boosting_adult.py
@@ -46,7 +46,7 @@
     toc = time()
     roc_auc = roc_auc_score(target_test, predicted_proba_test[:, 1])
     acc = accuracy_score(target_test, predicted_test)
-    print(f"predicted in {toc - tic:.3f}s, ROC AUC: {roc_auc:.4f}, ACC: {acc :.4f}")
+    print(f"predicted in {toc - tic:.3f}s, ROC AUC: {roc_auc:.4f}, ACC: {acc:.4f}")
 
 
 data = fetch_openml(data_id=179, as_frame=True)  # adult dataset

--- benchmarks/bench_hist_gradient_boosting_higgsboson.py
+++ benchmarks/bench_hist_gradient_boosting_higgsboson.py
@@ -74,7 +74,7 @@
     toc = time()
     roc_auc = roc_auc_score(target_test, predicted_proba_test[:, 1])
     acc = accuracy_score(target_test, predicted_test)
-    print(f"predicted in {toc - tic:.3f}s, ROC AUC: {roc_auc:.4f}, ACC: {acc :.4f}")
+    print(f"predicted in {toc - tic:.3f}s, ROC AUC: {roc_auc:.4f}, ACC: {acc:.4f}")
 
 
 df = load_data()

--- build_tools/get_comment.py
+++ build_tools/get_comment.py
@@ -55,9 +55,7 @@
     if end not in log:
         return ""
     res = (
-        "-----------------------------------------------\n"
-        f"### {title}\n\n"
-        f"{message}\n\n"
+        f"-----------------------------------------------\n### {title}\n\n{message}\n\n"
     )
     if details:
         res += (

--- examples/applications/plot_species_distribution_modeling.py
+++ examples/applications/plot_species_distribution_modeling.py
@@ -109,7 +109,7 @@
 
 
 def plot_species_distribution(
-    species=("bradypus_variegatus_0", "microryzomys_minutus_0")
+    species=("bradypus_variegatus_0", "microryzomys_minutus_0"),
 ):
     """
     Plot the species distribution.

--- examples/applications/plot_time_series_lagged_features.py
+++ examples/applications/plot_time_series_lagged_features.py
@@ -265,7 +265,7 @@
     time = cv_results["fit_time"]
     scores["fit_time"].append(f"{time.mean():.2f} ± {time.std():.2f} s")
 
-    scores["loss"].append(f"quantile {int(quantile*100)}")
+    scores["loss"].append(f"quantile {int(quantile * 100)}")
     for key, value in cv_results.items():
         if key.startswith("test_"):
             metric = key.split("test_")[1]

--- examples/applications/plot_topics_extraction_with_nmf_lda.py
+++ examples/applications/plot_topics_extraction_with_nmf_lda.py
@@ -50,7 +50,7 @@
 
         ax = axes[topic_idx]
         ax.barh(top_features, weights, height=0.7)
-        ax.set_title(f"Topic {topic_idx +1}", fontdict={"fontsize": 30})
+        ax.set_title(f"Topic {topic_idx + 1}", fontdict={"fontsize": 30})
         ax.tick_params(axis="both", which="major", labelsize=20)
         for i in "top right left".split():
             ax.spines[i].set_visible(False)

--- examples/ensemble/plot_bias_variance.py
+++ examples/ensemble/plot_bias_variance.py
@@ -177,8 +177,8 @@
 
     plt.subplot(2, n_estimators, n_estimators + n + 1)
     plt.plot(X_test, y_error, "r", label="$error(x)$")
-    plt.plot(X_test, y_bias, "b", label="$bias^2(x)$"),
-    plt.plot(X_test, y_var, "g", label="$variance(x)$"),
+    (plt.plot(X_test, y_bias, "b", label="$bias^2(x)$"),)
+    (plt.plot(X_test, y_var, "g", label="$variance(x)$"),)
     plt.plot(X_test, y_noise, "c", label="$noise(x)$")
 
     plt.xlim([-5, 5])

--- examples/linear_model/plot_tweedie_regression_insurance_claims.py
+++ examples/linear_model/plot_tweedie_regression_insurance_claims.py
@@ -606,8 +606,9 @@
             "predicted, frequency*severity model": np.sum(
                 exposure * glm_freq.predict(X) * glm_sev.predict(X)
             ),
-            "predicted, tweedie, power=%.2f"
-            % glm_pure_premium.power: np.sum(exposure * glm_pure_premium.predict(X)),
+            "predicted, tweedie, power=%.2f" % glm_pure_premium.power: np.sum(
+                exposure * glm_pure_premium.predict(X)
+            ),
         }
     )
 

--- examples/manifold/plot_lle_digits.py
+++ examples/manifold/plot_lle_digits.py
@@ -10,7 +10,6 @@
 # Authors: The scikit-learn developers
 # SPDX-License-Identifier: BSD-3-Clause
 
-
 # %%
 # Load digits dataset
 # -------------------

--- examples/manifold/plot_manifold_sphere.py
+++ examples/manifold/plot_manifold_sphere.py
@@ -50,7 +50,7 @@
 t = random_state.rand(n_samples) * np.pi
 
 # Sever the poles from the sphere.
-indices = (t < (np.pi - (np.pi / 8))) & (t > ((np.pi / 8)))
+indices = (t < (np.pi - (np.pi / 8))) & (t > (np.pi / 8))
 colors = p[indices]
 x, y, z = (
     np.sin(t[indices]) * np.cos(p[indices]),

--- examples/model_selection/plot_likelihood_ratios.py
+++ examples/model_selection/plot_likelihood_ratios.py
@@ -40,7 +40,7 @@
 from sklearn.datasets import make_classification
 
 X, y = make_classification(n_samples=10_000, weights=[0.9, 0.1], random_state=0)
-print(f"Percentage of people carrying the disease: {100*y.mean():.2f}%")
+print(f"Percentage of people carrying the disease: {100 * y.mean():.2f}%")
 
 # %%
 # A machine learning model is built to diagnose if a person with some given

--- examples/model_selection/plot_roc.py
+++ examples/model_selection/plot_roc.py
@@ -152,9 +152,9 @@
 #
 # We can briefly demo the effect of :func:`numpy.ravel`:
 
-print(f"y_score:\n{y_score[0:2,:]}")
+print(f"y_score:\n{y_score[0:2, :]}")
 print()
-print(f"y_score.ravel():\n{y_score[0:2,:].ravel()}")
+print(f"y_score.ravel():\n{y_score[0:2, :].ravel()}")
 
 # %%
 # In a multi-class classification setup with highly imbalanced classes,
@@ -359,7 +359,7 @@
     plt.plot(
         fpr_grid,
         mean_tpr[ix],
-        label=f"Mean {label_a} vs {label_b} (AUC = {mean_score :.2f})",
+        label=f"Mean {label_a} vs {label_b} (AUC = {mean_score:.2f})",
         linestyle=":",
         linewidth=4,
     )

--- maint_tools/bump-dependencies-versions.py
+++ maint_tools/bump-dependencies-versions.py
@@ -43,7 +43,7 @@
         for file_info in release_info:
             if (
                 file_info["packagetype"] == "bdist_wheel"
-                and f'cp{python_version.replace(".", "")}' in file_info["filename"]
+                and f"cp{python_version.replace('.', '')}" in file_info["filename"]
                 and not file_info["yanked"]
             ):
                 compatible_versions.append(ver)

--- sklearn/_loss/tests/test_loss.py
+++ sklearn/_loss/tests/test_loss.py
@@ -203,7 +203,8 @@
 
 
 @pytest.mark.parametrize(
-    "loss, y_true_success, y_true_fail", Y_COMMON_PARAMS + Y_TRUE_PARAMS  # type: ignore[operator]
+    "loss, y_true_success, y_true_fail",
+    Y_COMMON_PARAMS + Y_TRUE_PARAMS,  # type: ignore[operator]
 )
 def test_loss_boundary_y_true(loss, y_true_success, y_true_fail):
     """Test boundaries of y_true for loss functions."""
@@ -214,7 +215,8 @@
 
 
 @pytest.mark.parametrize(
-    "loss, y_pred_success, y_pred_fail", Y_COMMON_PARAMS + Y_PRED_PARAMS  # type: ignore[operator]
+    "loss, y_pred_success, y_pred_fail",
+    Y_COMMON_PARAMS + Y_PRED_PARAMS,  # type: ignore[operator]
 )
 def test_loss_boundary_y_pred(loss, y_pred_success, y_pred_fail):
     """Test boundaries of y_pred for loss functions."""
@@ -497,12 +499,14 @@
         sample_weight=sample_weight,
         loss_out=out_l1,
     )
-    loss.closs.loss(
-        y_true=y_true,
-        raw_prediction=raw_prediction,
-        sample_weight=sample_weight,
-        loss_out=out_l2,
-    ),
+    (
+        loss.closs.loss(
+            y_true=y_true,
+            raw_prediction=raw_prediction,
+            sample_weight=sample_weight,
+            loss_out=out_l2,
+        ),
+    )
     assert_allclose(out_l1, out_l2)
     loss.gradient(
         y_true=y_true,

--- sklearn/cluster/_feature_agglomeration.py
+++ sklearn/cluster/_feature_agglomeration.py
@@ -6,7 +6,6 @@
 # Authors: The scikit-learn developers
 # SPDX-License-Identifier: BSD-3-Clause
 
-
 import numpy as np
 from scipy.sparse import issparse
 

--- sklearn/cross_decomposition/tests/test_pls.py
+++ sklearn/cross_decomposition/tests/test_pls.py
@@ -404,12 +404,12 @@
 
     X_orig = X.copy()
     with pytest.raises(AssertionError):
-        pls.transform(X, y, copy=False),
+        (pls.transform(X, y, copy=False),)
         assert_array_almost_equal(X, X_orig)
 
     X_orig = X.copy()
     with pytest.raises(AssertionError):
-        pls.predict(X, copy=False),
+        (pls.predict(X, copy=False),)
         assert_array_almost_equal(X, X_orig)
 
     # Make sure copy=True gives same transform and predictions as predict=False

--- sklearn/datasets/tests/test_openml.py
+++ sklearn/datasets/tests/test_openml.py
@@ -105,9 +105,9 @@
         )
 
     def _mock_urlopen_shared(url, has_gzip_header, expected_prefix, suffix):
-        assert url.startswith(
-            expected_prefix
-        ), f"{expected_prefix!r} does not match {url!r}"
+        assert url.startswith(expected_prefix), (
+            f"{expected_prefix!r} does not match {url!r}"
+        )
 
         data_file_name = _file_name(url, suffix)
         data_file_path = resources.files(data_module) / data_file_name
@@ -156,9 +156,9 @@
         )
 
     def _mock_urlopen_data_list(url, has_gzip_header):
-        assert url.startswith(
-            url_prefix_data_list
-        ), f"{url_prefix_data_list!r} does not match {url!r}"
+        assert url.startswith(url_prefix_data_list), (
+            f"{url_prefix_data_list!r} does not match {url!r}"
+        )
 
         data_file_name = _file_name(url, ".json")
         data_file_path = resources.files(data_module) / data_file_name

--- sklearn/datasets/tests/test_samples_generator.py
+++ sklearn/datasets/tests/test_samples_generator.py
@@ -138,17 +138,17 @@
             signs = signs.view(dtype="|S{0}".format(signs.strides[0])).ravel()
             unique_signs, cluster_index = np.unique(signs, return_inverse=True)
 
-            assert (
-                len(unique_signs) == n_clusters
-            ), "Wrong number of clusters, or not in distinct quadrants"
+            assert len(unique_signs) == n_clusters, (
+                "Wrong number of clusters, or not in distinct quadrants"
+            )
 
             clusters_by_class = defaultdict(set)
             for cluster, cls in zip(cluster_index, y):
                 clusters_by_class[cls].add(cluster)
             for clusters in clusters_by_class.values():
-                assert (
-                    len(clusters) == n_clusters_per_class
-                ), "Wrong number of clusters per class"
+                assert len(clusters) == n_clusters_per_class, (
+                    "Wrong number of clusters per class"
+                )
             assert len(clusters_by_class) == n_classes, "Wrong number of classes"
 
             assert_array_almost_equal(
@@ -412,9 +412,9 @@
     X, y = make_blobs(n_samples=n_samples, n_features=2, random_state=0)
 
     assert X.shape == (sum(n_samples), 2), "X shape mismatch"
-    assert all(
-        np.bincount(y, minlength=len(n_samples)) == n_samples
-    ), "Incorrect number of samples per blob"
+    assert all(np.bincount(y, minlength=len(n_samples)) == n_samples), (
+        "Incorrect number of samples per blob"
+    )
 
 
 def test_make_blobs_n_samples_list_with_centers():
@@ -426,9 +426,9 @@
     )
 
     assert X.shape == (sum(n_samples), 2), "X shape mismatch"
-    assert all(
-        np.bincount(y, minlength=len(n_samples)) == n_samples
-    ), "Incorrect number of samples per blob"
+    assert all(np.bincount(y, minlength=len(n_samples)) == n_samples), (
+        "Incorrect number of samples per blob"
+    )
     for i, (ctr, std) in enumerate(zip(centers, cluster_stds)):
         assert_almost_equal((X[y == i] - ctr).std(), std, 1, "Unexpected std")
 
@@ -441,9 +441,9 @@
     X, y = make_blobs(n_samples=n_samples, centers=centers, random_state=0)
 
     assert X.shape == (sum(n_samples), 2), "X shape mismatch"
-    assert all(
-        np.bincount(y, minlength=len(n_samples)) == n_samples
-    ), "Incorrect number of samples per blob"
+    assert all(np.bincount(y, minlength=len(n_samples)) == n_samples), (
+        "Incorrect number of samples per blob"
+    )
 
 
 def test_make_blobs_return_centers():
@@ -681,9 +681,9 @@
 
 def test_make_moons_unbalanced():
     X, y = make_moons(n_samples=(7, 5))
-    assert (
-        np.sum(y == 0) == 7 and np.sum(y == 1) == 5
-    ), "Number of samples in a moon is wrong"
+    assert np.sum(y == 0) == 7 and np.sum(y == 1) == 5, (
+        "Number of samples in a moon is wrong"
+    )
     assert X.shape == (12, 2), "X shape mismatch"
     assert y.shape == (12,), "y shape mismatch"
 

--- sklearn/ensemble/_bagging.py
+++ sklearn/ensemble/_bagging.py
@@ -3,7 +3,6 @@
 # Authors: The scikit-learn developers
 # SPDX-License-Identifier: BSD-3-Clause
 
-
 import itertools
 import numbers
 from abc import ABCMeta, abstractmethod

--- sklearn/ensemble/_forest.py
+++ sklearn/ensemble/_forest.py
@@ -35,7 +35,6 @@
 # Authors: The scikit-learn developers
 # SPDX-License-Identifier: BSD-3-Clause
 
-
 import threading
 from abc import ABCMeta, abstractmethod
 from numbers import Integral, Real

--- sklearn/ensemble/tests/test_forest.py
+++ sklearn/ensemble/tests/test_forest.py
@@ -168,11 +168,12 @@
     reg = ForestRegressor(n_estimators=5, criterion=criterion, random_state=1)
     reg.fit(X_reg, y_reg)
     score = reg.score(X_reg, y_reg)
-    assert (
-        score > 0.93
-    ), "Failed with max_features=None, criterion %s and score = %f" % (
-        criterion,
-        score,
+    assert score > 0.93, (
+        "Failed with max_features=None, criterion %s and score = %f"
+        % (
+            criterion,
+            score,
+        )
     )
 
     reg = ForestRegressor(
@@ -1068,10 +1069,10 @@
         node_weights = np.bincount(out, weights=weights)
         # drop inner nodes
         leaf_weights = node_weights[node_weights != 0]
-        assert (
-            np.min(leaf_weights) >= total_weight * est.min_weight_fraction_leaf
-        ), "Failed with {0} min_weight_fraction_leaf={1}".format(
-            name, est.min_weight_fraction_leaf
+        assert np.min(leaf_weights) >= total_weight * est.min_weight_fraction_leaf, (
+            "Failed with {0} min_weight_fraction_leaf={1}".format(
+                name, est.min_weight_fraction_leaf
+            )
         )
 
 

--- sklearn/experimental/enable_hist_gradient_boosting.py
+++ sklearn/experimental/enable_hist_gradient_boosting.py
@@ -13,7 +13,6 @@
 # Don't remove this file, we don't want to break users code just because the
 # feature isn't experimental anymore.
 
-
 import warnings
 
 warnings.warn(

--- sklearn/feature_selection/_univariate_selection.py
+++ sklearn/feature_selection/_univariate_selection.py
@@ -3,7 +3,6 @@
 # Authors: The scikit-learn developers
 # SPDX-License-Identifier: BSD-3-Clause
 
-
 import warnings
 from numbers import Integral, Real
 

--- sklearn/gaussian_process/tests/test_gpc.py
+++ sklearn/gaussian_process/tests/test_gpc.py
@@ -147,8 +147,9 @@
     # Define a dummy optimizer that simply tests 10 random hyperparameters
     def optimizer(obj_func, initial_theta, bounds):
         rng = np.random.RandomState(global_random_seed)
-        theta_opt, func_min = initial_theta, obj_func(
-            initial_theta, eval_gradient=False
+        theta_opt, func_min = (
+            initial_theta,
+            obj_func(initial_theta, eval_gradient=False),
         )
         for _ in range(10):
             theta = np.atleast_1d(

--- sklearn/gaussian_process/tests/test_gpr.py
+++ sklearn/gaussian_process/tests/test_gpr.py
@@ -394,8 +394,9 @@
     # Define a dummy optimizer that simply tests 50 random hyperparameters
     def optimizer(obj_func, initial_theta, bounds):
         rng = np.random.RandomState(0)
-        theta_opt, func_min = initial_theta, obj_func(
-            initial_theta, eval_gradient=False
+        theta_opt, func_min = (
+            initial_theta,
+            obj_func(initial_theta, eval_gradient=False),
         )
         for _ in range(50):
             theta = np.atleast_1d(

--- sklearn/inspection/_plot/tests/test_plot_partial_dependence.py
+++ sklearn/inspection/_plot/tests/test_plot_partial_dependence.py
@@ -1186,9 +1186,9 @@
     )
 
     line = disp.lines_[0, 0, -1]
-    assert (
-        line.get_color() == expected_colors[0]
-    ), f"{line.get_color()}!={expected_colors[0]}\n{line_kw} and {pd_line_kw}"
+    assert line.get_color() == expected_colors[0], (
+        f"{line.get_color()}!={expected_colors[0]}\n{line_kw} and {pd_line_kw}"
+    )
     if pd_line_kw is not None:
         if "linestyle" in pd_line_kw:
             assert line.get_linestyle() == pd_line_kw["linestyle"]
@@ -1198,9 +1198,9 @@
         assert line.get_linestyle() == "--"
 
     line = disp.lines_[0, 0, 0]
-    assert (
-        line.get_color() == expected_colors[1]
-    ), f"{line.get_color()}!={expected_colors[1]}"
+    assert line.get_color() == expected_colors[1], (
+        f"{line.get_color()}!={expected_colors[1]}"
+    )
     if ice_lines_kw is not None:
         if "linestyle" in ice_lines_kw:
             assert line.get_linestyle() == ice_lines_kw["linestyle"]

--- sklearn/linear_model/_glm/_newton_solver.py
+++ sklearn/linear_model/_glm/_newton_solver.py
@@ -254,7 +254,7 @@
             check = loss_improvement <= t * armijo_term
             if is_verbose:
                 print(
-                    f"    line search iteration={i+1}, step size={t}\n"
+                    f"    line search iteration={i + 1}, step size={t}\n"
                     f"      check loss improvement <= armijo term: {loss_improvement} "
                     f"<= {t * armijo_term} {check}"
                 )
@@ -300,7 +300,7 @@
         self.raw_prediction = raw
         if is_verbose:
             print(
-                f"    line search successful after {i+1} iterations with "
+                f"    line search successful after {i + 1} iterations with "
                 f"loss={self.loss_value}."
             )
 

--- sklearn/linear_model/_linear_loss.py
+++ sklearn/linear_model/_linear_loss.py
@@ -537,9 +537,9 @@
                 # The L2 penalty enters the Hessian on the diagonal only. To add those
                 # terms, we use a flattened view of the array.
                 order = "C" if hess.flags.c_contiguous else "F"
-                hess.reshape(-1, order=order)[
-                    : (n_features * n_dof) : (n_dof + 1)
-                ] += l2_reg_strength
+                hess.reshape(-1, order=order)[: (n_features * n_dof) : (n_dof + 1)] += (
+                    l2_reg_strength
+                )
 
             if self.fit_intercept:
                 # With intercept included as added column to X, the hessian becomes

--- sklearn/linear_model/_ridge.py
+++ sklearn/linear_model/_ridge.py
@@ -5,7 +5,6 @@
 # Authors: The scikit-learn developers
 # SPDX-License-Identifier: BSD-3-Clause
 
-
 import numbers
 import warnings
 from abc import ABCMeta, abstractmethod

--- sklearn/linear_model/_theil_sen.py
+++ sklearn/linear_model/_theil_sen.py
@@ -5,7 +5,6 @@
 # Authors: The scikit-learn developers
 # SPDX-License-Identifier: BSD-3-Clause
 
-
 import warnings
 from itertools import combinations
 from numbers import Integral, Real

--- sklearn/linear_model/tests/test_ridge.py
+++ sklearn/linear_model/tests/test_ridge.py
@@ -860,9 +860,9 @@
     loo_ridge.fit(X, y)
     gcv_ridge.fit(X, y)
 
-    assert gcv_ridge.alpha_ == pytest.approx(
-        loo_ridge.alpha_
-    ), f"{gcv_ridge.alpha_=}, {loo_ridge.alpha_=}"
+    assert gcv_ridge.alpha_ == pytest.approx(loo_ridge.alpha_), (
+        f"{gcv_ridge.alpha_=}, {loo_ridge.alpha_=}"
+    )
     assert_allclose(gcv_ridge.coef_, loo_ridge.coef_, rtol=1e-3)
     assert_allclose(gcv_ridge.intercept_, loo_ridge.intercept_, rtol=1e-3)
 
@@ -1522,9 +1522,9 @@
     X = rng.randn(n_samples, n_features)
 
     ridge_est = Estimator(alphas=alphas)
-    assert (
-        ridge_est.alphas is alphas
-    ), f"`alphas` was mutated in `{Estimator.__name__}.__init__`"
+    assert ridge_est.alphas is alphas, (
+        f"`alphas` was mutated in `{Estimator.__name__}.__init__`"
+    )
 
     ridge_est.fit(X, y)
     assert_array_equal(ridge_est.alphas, np.asarray(alphas))

--- sklearn/manifold/_spectral_embedding.py
+++ sklearn/manifold/_spectral_embedding.py
@@ -3,7 +3,6 @@
 # Authors: The scikit-learn developers
 # SPDX-License-Identifier: BSD-3-Clause
 
-
 import warnings
 from numbers import Integral, Real
 

--- sklearn/manifold/_t_sne.py
+++ sklearn/manifold/_t_sne.py
@@ -949,9 +949,9 @@
             P = _joint_probabilities(distances, self.perplexity, self.verbose)
             assert np.all(np.isfinite(P)), "All probabilities should be finite"
             assert np.all(P >= 0), "All probabilities should be non-negative"
-            assert np.all(
-                P <= 1
-            ), "All probabilities should be less or then equal to one"
+            assert np.all(P <= 1), (
+                "All probabilities should be less or then equal to one"
+            )
 
         else:
             # Compute the number of nearest neighbors to find.

--- sklearn/metrics/_ranking.py
+++ sklearn/metrics/_ranking.py
@@ -10,7 +10,6 @@
 # Authors: The scikit-learn developers
 # SPDX-License-Identifier: BSD-3-Clause
 
-
 import warnings
 from functools import partial
 from numbers import Integral, Real

--- sklearn/metrics/cluster/_supervised.py
+++ sklearn/metrics/cluster/_supervised.py
@@ -7,7 +7,6 @@
 # Authors: The scikit-learn developers
 # SPDX-License-Identifier: BSD-3-Clause
 
-
 import warnings
 from math import log
 from numbers import Real

--- sklearn/metrics/cluster/_unsupervised.py
+++ sklearn/metrics/cluster/_unsupervised.py
@@ -3,7 +3,6 @@
 # Authors: The scikit-learn developers
 # SPDX-License-Identifier: BSD-3-Clause
 
-
 import functools
 from numbers import Integral
 

--- sklearn/metrics/tests/test_common.py
+++ sklearn/metrics/tests/test_common.py
@@ -641,7 +641,6 @@
 
 @pytest.mark.parametrize("name", sorted(NOT_SYMMETRIC_METRICS))
 def test_not_symmetric_metric(name):
-
     # Test the symmetry of score and loss functions
     random_state = check_random_state(0)
     metric = ALL_METRICS[name]
@@ -1005,7 +1004,8 @@
 @pytest.mark.parametrize("metric", CLASSIFICATION_METRICS.values())
 @pytest.mark.parametrize(
     "y_true, y_score",
-    invalids_nan_inf +
+    invalids_nan_inf
+    +
     # Add an additional case for classification only
     # non-regression test for:
     # https://github.com/scikit-learn/scikit-learn/issues/6809
@@ -2104,7 +2104,6 @@
 
 
 def check_array_api_metric_pairwise(metric, array_namespace, device, dtype_name):
-
     X_np = np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]], dtype=dtype_name)
     Y_np = np.array([[0.2, 0.3, 0.4], [0.5, 0.6, 0.7]], dtype=dtype_name)
 

--- sklearn/metrics/tests/test_pairwise_distances_reduction.py
+++ sklearn/metrics/tests/test_pairwise_distances_reduction.py
@@ -228,9 +228,9 @@
     # on average. Yielding too many results would make the test slow (because
     # checking the results is expensive for large result sets), yielding 0 most
     # of the time would make the test useless.
-    assert (
-        precomputed_dists is not None or metric is not None
-    ), "Either metric or precomputed_dists must be provided."
+    assert precomputed_dists is not None or metric is not None, (
+        "Either metric or precomputed_dists must be provided."
+    )
 
     if precomputed_dists is None:
         assert X is not None

--- sklearn/mixture/tests/test_bayesian_mixture.py
+++ sklearn/mixture/tests/test_bayesian_mixture.py
@@ -118,7 +118,7 @@
     )
     msg = (
         "The parameter 'degrees_of_freedom_prior' should be greater than"
-        f" {n_features -1}, but got {bad_degrees_of_freedom_prior_:.3f}."
+        f" {n_features - 1}, but got {bad_degrees_of_freedom_prior_:.3f}."
     )
     with pytest.raises(ValueError, match=msg):
         bgmm.fit(X)

--- sklearn/model_selection/_validation.py
+++ sklearn/model_selection/_validation.py
@@ -6,7 +6,6 @@
 # Authors: The scikit-learn developers
 # SPDX-License-Identifier: BSD-3-Clause
 
-
 import numbers
 import time
 import warnings
@@ -819,9 +818,9 @@
     progress_msg = ""
     if verbose > 2:
         if split_progress is not None:
-            progress_msg = f" {split_progress[0]+1}/{split_progress[1]}"
+            progress_msg = f" {split_progress[0] + 1}/{split_progress[1]}"
         if candidate_progress and verbose > 9:
-            progress_msg += f"; {candidate_progress[0]+1}/{candidate_progress[1]}"
+            progress_msg += f"; {candidate_progress[0] + 1}/{candidate_progress[1]}"
 
     if verbose > 1:
         if parameters is None:

--- sklearn/model_selection/tests/test_search.py
+++ sklearn/model_selection/tests/test_search.py
@@ -2422,9 +2422,9 @@
     for _pairwise_setting in [True, False]:
         est.set_params(pairwise=_pairwise_setting)
         cv = GridSearchCV(est, {"n_neighbors": [10]})
-        assert (
-            _pairwise_setting == cv.__sklearn_tags__().input_tags.pairwise
-        ), attr_message
+        assert _pairwise_setting == cv.__sklearn_tags__().input_tags.pairwise, (
+            attr_message
+        )
 
 
 def test_search_cv_pairwise_property_equivalence_of_precomputed():

--- sklearn/model_selection/tests/test_split.py
+++ sklearn/model_selection/tests/test_split.py
@@ -886,9 +886,9 @@
         bf = stats.binom(n_splits, p)
         for count in idx_counts:
             prob = bf.pmf(count)
-            assert (
-                prob > threshold
-            ), "An index is not drawn with chance corresponding to even draws"
+            assert prob > threshold, (
+                "An index is not drawn with chance corresponding to even draws"
+            )
 
     for n_samples in (6, 22):
         groups = np.array((n_samples // 2) * [0, 1])

--- sklearn/multioutput.py
+++ sklearn/multioutput.py
@@ -8,7 +8,6 @@
 # Authors: The scikit-learn developers
 # SPDX-License-Identifier: BSD-3-Clause
 
-
 import warnings
 from abc import ABCMeta, abstractmethod
 from numbers import Integral
@@ -687,7 +686,6 @@
             )
 
         if self.base_estimator != "deprecated":
-
             warning_msg = (
                 "`base_estimator` as an argument was deprecated in 1.7 and will be"
                 " removed in 1.9. Use `estimator` instead."

--- sklearn/neighbors/tests/test_neighbors.py
+++ sklearn/neighbors/tests/test_neighbors.py
@@ -653,10 +653,12 @@
             assert_allclose(np.concatenate(list(ind)), np.concatenate(list(ind1)))
 
         for i in range(len(results) - 1):
-            assert_allclose(
-                np.concatenate(list(results[i][0])),
-                np.concatenate(list(results[i + 1][0])),
-            ),
+            (
+                assert_allclose(
+                    np.concatenate(list(results[i][0])),
+                    np.concatenate(list(results[i + 1][0])),
+                ),
+            )
             assert_allclose(
                 np.concatenate(list(results[i][1])),
                 np.concatenate(list(results[i + 1][1])),

--- sklearn/preprocessing/tests/test_function_transformer.py
+++ sklearn/preprocessing/tests/test_function_transformer.py
@@ -36,13 +36,13 @@
     )
 
     # The function should only have received X.
-    assert args_store == [
-        X
-    ], "Incorrect positional arguments passed to func: {args}".format(args=args_store)
+    assert args_store == [X], (
+        "Incorrect positional arguments passed to func: {args}".format(args=args_store)
+    )
 
-    assert (
-        not kwargs_store
-    ), "Unexpected keyword arguments passed to func: {args}".format(args=kwargs_store)
+    assert not kwargs_store, (
+        "Unexpected keyword arguments passed to func: {args}".format(args=kwargs_store)
+    )
 
     # reset the argument stores.
     args_store[:] = []
@@ -56,13 +56,13 @@
     )
 
     # The function should have received X
-    assert args_store == [
-        X
-    ], "Incorrect positional arguments passed to func: {args}".format(args=args_store)
+    assert args_store == [X], (
+        "Incorrect positional arguments passed to func: {args}".format(args=args_store)
+    )
 
-    assert (
-        not kwargs_store
-    ), "Unexpected keyword arguments passed to func: {args}".format(args=kwargs_store)
+    assert not kwargs_store, (
+        "Unexpected keyword arguments passed to func: {args}".format(args=kwargs_store)
+    )
 
 
 def test_np_log():

--- sklearn/semi_supervised/_self_training.py
+++ sklearn/semi_supervised/_self_training.py
@@ -217,8 +217,7 @@
         # TODO(1.8) remove
         elif self.estimator is None and self.base_estimator == "deprecated":
             raise ValueError(
-                "You must pass an estimator to SelfTrainingClassifier."
-                " Use `estimator`."
+                "You must pass an estimator to SelfTrainingClassifier. Use `estimator`."
             )
         elif self.estimator is not None and self.base_estimator != "deprecated":
             raise ValueError(

--- sklearn/tests/metadata_routing_common.py
+++ sklearn/tests/metadata_routing_common.py
@@ -74,9 +74,9 @@
     for record in all_records:
         # first check that the names of the metadata passed are the same as
         # expected. The names are stored as keys in `record`.
-        assert set(kwargs.keys()) == set(
-            record.keys()
-        ), f"Expected {kwargs.keys()} vs {record.keys()}"
+        assert set(kwargs.keys()) == set(record.keys()), (
+            f"Expected {kwargs.keys()} vs {record.keys()}"
+        )
         for key, value in kwargs.items():
             recorded_value = record[key]
             # The following condition is used to check for any specified parameters
@@ -87,9 +87,9 @@
                 if isinstance(recorded_value, np.ndarray):
                     assert_array_equal(recorded_value, value)
                 else:
-                    assert (
-                        recorded_value is value
-                    ), f"Expected {recorded_value} vs {value}. Method: {method}"
+                    assert recorded_value is value, (
+                        f"Expected {recorded_value} vs {value}. Method: {method}"
+                    )
 
 
 record_metadata_not_default = partial(record_metadata, record_default=False)

--- sklearn/tests/test_common.py
+++ sklearn/tests/test_common.py
@@ -296,7 +296,6 @@
     "transformer", GET_FEATURES_OUT_ESTIMATORS, ids=_get_check_estimator_ids
 )
 def test_transformers_get_feature_names_out(transformer):
-
     with ignore_warnings(category=(FutureWarning)):
         check_transformer_get_feature_names_out(
             transformer.__class__.__name__, transformer

--- sklearn/tests/test_discriminant_analysis.py
+++ sklearn/tests/test_discriminant_analysis.py
@@ -304,16 +304,16 @@
     clf_lda_eigen = LinearDiscriminantAnalysis(solver="eigen")
     clf_lda_eigen.fit(X, y)
     assert_almost_equal(clf_lda_eigen.explained_variance_ratio_.sum(), 1.0, 3)
-    assert clf_lda_eigen.explained_variance_ratio_.shape == (
-        2,
-    ), "Unexpected length for explained_variance_ratio_"
+    assert clf_lda_eigen.explained_variance_ratio_.shape == (2,), (
+        "Unexpected length for explained_variance_ratio_"
+    )
 
     clf_lda_svd = LinearDiscriminantAnalysis(solver="svd")
     clf_lda_svd.fit(X, y)
     assert_almost_equal(clf_lda_svd.explained_variance_ratio_.sum(), 1.0, 3)
-    assert clf_lda_svd.explained_variance_ratio_.shape == (
-        2,
-    ), "Unexpected length for explained_variance_ratio_"
+    assert clf_lda_svd.explained_variance_ratio_.shape == (2,), (
+        "Unexpected length for explained_variance_ratio_"
+    )
 
     assert_array_almost_equal(
         clf_lda_svd.explained_variance_ratio_, clf_lda_eigen.explained_variance_ratio_

--- sklearn/tests/test_metaestimators.py
+++ sklearn/tests/test_metaestimators.py
@@ -157,11 +157,12 @@
             if method in delegator_data.skip_methods:
                 continue
             assert hasattr(delegate, method)
-            assert hasattr(
-                delegator, method
-            ), "%s does not have method %r when its delegate does" % (
-                delegator_data.name,
-                method,
+            assert hasattr(delegator, method), (
+                "%s does not have method %r when its delegate does"
+                % (
+                    delegator_data.name,
+                    method,
+                )
             )
             # delegation before fit raises a NotFittedError
             if method == "score":
@@ -191,11 +192,12 @@
             delegate = SubEstimator(hidden_method=method)
             delegator = delegator_data.construct(delegate)
             assert not hasattr(delegate, method)
-            assert not hasattr(
-                delegator, method
-            ), "%s has method %r when its delegate does not" % (
-                delegator_data.name,
-                method,
+            assert not hasattr(delegator, method), (
+                "%s has method %r when its delegate does not"
+                % (
+                    delegator_data.name,
+                    method,
+                )
             )
 
 

--- sklearn/tree/tests/test_monotonic_tree.py
+++ sklearn/tree/tests/test_monotonic_tree.py
@@ -80,9 +80,9 @@
     est.fit(X_train, y_train)
     proba_test = est.predict_proba(X_test)
 
-    assert np.logical_and(
-        proba_test >= 0.0, proba_test <= 1.0
-    ).all(), "Probability should always be in [0, 1] range."
+    assert np.logical_and(proba_test >= 0.0, proba_test <= 1.0).all(), (
+        "Probability should always be in [0, 1] range."
+    )
     assert_allclose(proba_test.sum(axis=1), 1.0)
 
     # Monotonic increase constraint, it applies to the positive class

--- sklearn/tree/tests/test_tree.py
+++ sklearn/tree/tests/test_tree.py
@@ -198,10 +198,10 @@
 
 
 def assert_tree_equal(d, s, message):
-    assert (
-        s.node_count == d.node_count
-    ), "{0}: inequal number of node ({1} != {2})".format(
-        message, s.node_count, d.node_count
+    assert s.node_count == d.node_count, (
+        "{0}: inequal number of node ({1} != {2})".format(
+            message, s.node_count, d.node_count
+        )
     )
 
     assert_array_equal(
@@ -330,9 +330,9 @@
     reg = Tree(criterion=criterion, random_state=0)
     reg.fit(diabetes.data, diabetes.target)
     score = mean_squared_error(diabetes.target, reg.predict(diabetes.data))
-    assert score == pytest.approx(
-        0
-    ), f"Failed with {name}, criterion = {criterion} and score = {score}"
+    assert score == pytest.approx(0), (
+        f"Failed with {name}, criterion = {criterion} and score = {score}"
+    )
 
 
 @skip_if_32bit
@@ -697,10 +697,10 @@
         node_weights = np.bincount(out, weights=weights)
         # drop inner nodes
         leaf_weights = node_weights[node_weights != 0]
-        assert (
-            np.min(leaf_weights) >= total_weight * est.min_weight_fraction_leaf
-        ), "Failed with {0} min_weight_fraction_leaf={1}".format(
-            name, est.min_weight_fraction_leaf
+        assert np.min(leaf_weights) >= total_weight * est.min_weight_fraction_leaf, (
+            "Failed with {0} min_weight_fraction_leaf={1}".format(
+                name, est.min_weight_fraction_leaf
+            )
         )
 
     # test case with no weights passed in
@@ -720,10 +720,10 @@
         node_weights = np.bincount(out)
         # drop inner nodes
         leaf_weights = node_weights[node_weights != 0]
-        assert (
-            np.min(leaf_weights) >= total_weight * est.min_weight_fraction_leaf
-        ), "Failed with {0} min_weight_fraction_leaf={1}".format(
-            name, est.min_weight_fraction_leaf
+        assert np.min(leaf_weights) >= total_weight * est.min_weight_fraction_leaf, (
+            "Failed with {0} min_weight_fraction_leaf={1}".format(
+                name, est.min_weight_fraction_leaf
+            )
         )
 
 
@@ -845,10 +845,10 @@
             (est3, 0.0001),
             (est4, 0.1),
         ):
-            assert (
-                est.min_impurity_decrease <= expected_decrease
-            ), "Failed, min_impurity_decrease = {0} > {1}".format(
-                est.min_impurity_decrease, expected_decrease
+            assert est.min_impurity_decrease <= expected_decrease, (
+                "Failed, min_impurity_decrease = {0} > {1}".format(
+                    est.min_impurity_decrease, expected_decrease
+                )
             )
             est.fit(X, y)
             for node in range(est.tree_.node_count):
@@ -879,10 +879,10 @@
                         imp_parent - wtd_avg_left_right_imp
                     )
 
-                    assert (
-                        actual_decrease >= expected_decrease
-                    ), "Failed with {0} expected min_impurity_decrease={1}".format(
-                        actual_decrease, expected_decrease
+                    assert actual_decrease >= expected_decrease, (
+                        "Failed with {0} expected min_impurity_decrease={1}".format(
+                            actual_decrease, expected_decrease
+                        )
                     )
 
 
@@ -923,9 +923,9 @@
         assert type(est2) == est.__class__
 
         score2 = est2.score(X, y)
-        assert (
-            score == score2
-        ), "Failed to generate same score  after pickling with {0}".format(name)
+        assert score == score2, (
+            "Failed to generate same score  after pickling with {0}".format(name)
+        )
         for attribute in fitted_attribute:
             assert_array_equal(
                 getattr(est2.tree_, attribute),
@@ -2614,9 +2614,9 @@
     # Check that the tree can learn the predictive feature
     # over an average of cross-validation fits.
     tree_cv_score = cross_val_score(tree, X, y, cv=5).mean()
-    assert (
-        tree_cv_score >= expected_score
-    ), f"Expected CV score: {expected_score} but got {tree_cv_score}"
+    assert tree_cv_score >= expected_score, (
+        f"Expected CV score: {expected_score} but got {tree_cv_score}"
+    )
 
 
 @pytest.mark.parametrize(

--- sklearn/utils/_metadata_requests.py
+++ sklearn/utils/_metadata_requests.py
@@ -1101,8 +1101,9 @@
             method_mapping = MethodMapping()
             for method in METHODS:
                 method_mapping.add(caller=method, callee=method)
-            yield "$self_request", RouterMappingPair(
-                mapping=method_mapping, router=self._self_request
+            yield (
+                "$self_request",
+                RouterMappingPair(mapping=method_mapping, router=self._self_request),
             )
         for name, route_mapping in self._route_mappings.items():
             yield (name, route_mapping)

--- sklearn/utils/_test_common/instance_generator.py
+++ sklearn/utils/_test_common/instance_generator.py
@@ -961,8 +961,7 @@
     },
     HalvingGridSearchCV: {
         "check_fit2d_1sample": (
-            "Fail during parameter check since min/max resources requires"
-            " more samples"
+            "Fail during parameter check since min/max resources requires more samples"
         ),
         "check_estimators_nan_inf": "FIXME",
         "check_classifiers_one_label_sample_weights": "FIXME",
@@ -972,8 +971,7 @@
     },
     HalvingRandomSearchCV: {
         "check_fit2d_1sample": (
-            "Fail during parameter check since min/max resources requires"
-            " more samples"
+            "Fail during parameter check since min/max resources requires more samples"
         ),
         "check_estimators_nan_inf": "FIXME",
         "check_classifiers_one_label_sample_weights": "FIXME",

--- sklearn/utils/estimator_checks.py
+++ sklearn/utils/estimator_checks.py
@@ -4759,9 +4759,9 @@
     else:
         n_features_out = X_transform.shape[1]
 
-    assert (
-        len(feature_names_out) == n_features_out
-    ), f"Expected {n_features_out} feature names, got {len(feature_names_out)}"
+    assert len(feature_names_out) == n_features_out, (
+        f"Expected {n_features_out} feature names, got {len(feature_names_out)}"
+    )
 
 
 def check_transformer_get_feature_names_out_pandas(name, transformer_orig):
@@ -4816,9 +4816,9 @@
     else:
         n_features_out = X_transform.shape[1]
 
-    assert (
-        len(feature_names_out_default) == n_features_out
-    ), f"Expected {n_features_out} feature names, got {len(feature_names_out_default)}"
+    assert len(feature_names_out_default) == n_features_out, (
+        f"Expected {n_features_out} feature names, got {len(feature_names_out_default)}"
+    )
 
 
 def check_param_validation(name, estimator_orig):
@@ -5329,9 +5329,7 @@
                 'Only binary classification is supported. The type of the target '
                 f'is {{y_type}}.'
         )
-    """.format(
-        name=name
-    )
+    """.format(name=name)
     err_msg = textwrap.dedent(err_msg)
 
     with raises(

--- sklearn/utils/tests/test_indexing.py
+++ sklearn/utils/tests/test_indexing.py
@@ -583,7 +583,6 @@
 
 
 def test_notimplementederror():
-
     with pytest.raises(
         NotImplementedError,
         match="Resampling with sample_weight is only implemented for replace=True.",

--- sklearn/utils/tests/test_multiclass.py
+++ sklearn/utils/tests/test_multiclass.py
@@ -369,17 +369,17 @@
                     )
                 ]
                 for exmpl_sparse in examples_sparse:
-                    assert sparse_exp == is_multilabel(
-                        exmpl_sparse
-                    ), f"is_multilabel({exmpl_sparse!r}) should be {sparse_exp}"
+                    assert sparse_exp == is_multilabel(exmpl_sparse), (
+                        f"is_multilabel({exmpl_sparse!r}) should be {sparse_exp}"
+                    )
 
             # Densify sparse examples before testing
             if issparse(example):
                 example = example.toarray()
 
-            assert dense_exp == is_multilabel(
-                example
-            ), f"is_multilabel({example!r}) should be {dense_exp}"
+            assert dense_exp == is_multilabel(example), (
+                f"is_multilabel({example!r}) should be {dense_exp}"
+            )
 
 
 @pytest.mark.parametrize(
@@ -400,9 +400,9 @@
             example = xp.asarray(example, device=device)
 
             with config_context(array_api_dispatch=True):
-                assert dense_exp == is_multilabel(
-                    example
-                ), f"is_multilabel({example!r}) should be {dense_exp}"
+                assert dense_exp == is_multilabel(example), (
+                    f"is_multilabel({example!r}) should be {dense_exp}"
+                )
 
 
 def test_check_classification_targets():
@@ -420,12 +420,13 @@
 def test_type_of_target():
     for group, group_examples in EXAMPLES.items():
         for example in group_examples:
-            assert (
-                type_of_target(example) == group
-            ), "type_of_target(%r) should be %r, got %r" % (
-                example,
-                group,
-                type_of_target(example),
+            assert type_of_target(example) == group, (
+                "type_of_target(%r) should be %r, got %r"
+                % (
+                    example,
+                    group,
+                    type_of_target(example),
+                )
             )
 
     for example in NON_ARRAY_LIKE_EXAMPLES:

--- sklearn/utils/tests/test_seq_dataset.py
+++ sklearn/utils/tests/test_seq_dataset.py
@@ -154,30 +154,34 @@
 
 def test_buffer_dtype_mismatch_error():
     with pytest.raises(ValueError, match="Buffer dtype mismatch"):
-        ArrayDataset64(X32, y32, sample_weight32, seed=42),
+        (ArrayDataset64(X32, y32, sample_weight32, seed=42),)
 
     with pytest.raises(ValueError, match="Buffer dtype mismatch"):
-        ArrayDataset32(X64, y64, sample_weight64, seed=42),
+        (ArrayDataset32(X64, y64, sample_weight64, seed=42),)
 
     for csr_container in CSR_CONTAINERS:
         X_csr32 = csr_container(X32)
         X_csr64 = csr_container(X64)
         with pytest.raises(ValueError, match="Buffer dtype mismatch"):
-            CSRDataset64(
-                X_csr32.data,
-                X_csr32.indptr,
-                X_csr32.indices,
-                y32,
-                sample_weight32,
-                seed=42,
-            ),
+            (
+                CSRDataset64(
+                    X_csr32.data,
+                    X_csr32.indptr,
+                    X_csr32.indices,
+                    y32,
+                    sample_weight32,
+                    seed=42,
+                ),
+            )
 
         with pytest.raises(ValueError, match="Buffer dtype mismatch"):
-            CSRDataset32(
-                X_csr64.data,
-                X_csr64.indptr,
-                X_csr64.indices,
-                y64,
-                sample_weight64,
-                seed=42,
-            ),
+            (
+                CSRDataset32(
+                    X_csr64.data,
+                    X_csr64.indptr,
+                    X_csr64.indices,
+                    y64,
+                    sample_weight64,
+                    seed=42,
+                ),
+            )

--- sklearn/utils/tests/test_tags.py
+++ sklearn/utils/tests/test_tags.py
@@ -565,7 +565,6 @@
     assert _to_new_tags(_to_old_tags(new_tags), estimator=estimator) == new_tags
 
     class MyClass:
-
         def fit(self, X, y=None):
             return self  # pragma: no cover
 

--- sklearn/utils/tests/test_validation.py
+++ sklearn/utils/tests/test_validation.py
@@ -852,9 +852,9 @@
         def fit(self, X, y, sample_weight=None):
             pass
 
-    assert has_fit_parameter(
-        TestClassWithDeprecatedFitMethod, "sample_weight"
-    ), "has_fit_parameter fails for class with deprecated fit method."
+    assert has_fit_parameter(TestClassWithDeprecatedFitMethod, "sample_weight"), (
+        "has_fit_parameter fails for class with deprecated fit method."
+    )
 
 
 def test_check_symmetric():

--- sklearn/utils/validation.py
+++ sklearn/utils/validation.py
@@ -1547,8 +1547,7 @@
         # hasattr(estimator, "fit") makes it so that we don't fail for an estimator
         # that does not have a `fit` method during collection of checks. The right
         # checks will fail later.
-        hasattr(estimator, "fit")
-        and parameter in signature(estimator.fit).parameters
+        hasattr(estimator, "fit") and parameter in signature(estimator.fit).parameters
     )
 
 

61 files would be reformatted, 858 files already formatted

_{Generated for commit: 31784f4. Link to the linter CI: here}

thomasjpfan · 2025-04-05T17:39:00Z

Looks like wheel building works!

There is an issue with windows build because there is no 3.13t pandas build: https://github.com/scikit-learn/scikit-learn/actions/runs/14283757544/job/40036367623?pr=31151 (Also failing on main)

We can see that the wheels from this PR is smaller by ~ 25%

The wheels from this PR: https://github.com/scikit-learn/scikit-learn/actions/runs/14283757544?pr=31151
The wheels from on main: https://github.com/scikit-learn/scikit-learn/actions/runs/14278107805

jeremiedbb · 2025-04-08T20:11:01Z

Thanks for trying this out !
Let's hope they release 3.1.0 before 1.7 😄

For the sake of precision, you wrote that it fixes #27767 but I think there are still suggestions that we haven't tried yet there that can add even more size reduction (#27767 (comment)).

lesteve · 2025-04-09T08:25:32Z

Thanks for the PR, the general direction seems fine. Even if we don't require a recent enough version the cython >= 3.1.0 branch will be tested in the scipy-dev build.

FYI I have a PR working around the lack of Windows free-threaded wheel for pandas: #31159.

There is also some investigation on the pandas side see pandas-dev/pandas#61242 and pandas-dev/pandas#61249 (apparently the likely culprit is cython-dev).

lesteve · 2025-04-09T13:16:30Z

sklearn/meson.build

+    subdir: 'sklearn',
+    cython_args: cython_args,
+    install: true,
+    install_tag: 'python-runtime',


I haven't fully understood what install tags were used for but we are not using them in scikit-learn for now so I would remove this line.

The install tags are quite useful from a packaging perspective. You may use them for adding convenience for downstream packagers for reducing the size of the binaries further. Here's an example PR from NumPy: numpy/numpy#26274

For reference: https://docs.scipy.org/doc/scipy/building/redistributable_binaries.html

Additional context: recently, we released https://github.com/pyodide/pyodide-build/releases/v0.30.0, where I added a feature that allows building a package without isolation. I'm now planning to use this support in NumPy/SciPy for Pyodide so that we can have an easier way to unvendor the tests away from the WASM binaries. If scikit-learn can also implement these, it makes the wheels smaller, and that means that the interactive docs use less bandwidth and load a tad faster.

See also: scikit-image/scikit-image#7470

…memory_view_pr

…memory_view_pr' into thomasjpfan/cython_memory_view_pr

lesteve · 2025-04-10T10:38:00Z

sklearn/linear_model/meson.build

@@ -22,7 +22,7 @@ foreach name: name_list
    # TODO in principle this should go in py.exension_module below. This is
    # temporary work-around for dependency issue with .pyx.tp files. For more
    # details, see https://github.com/mesonbuild/meson/issues/13212
-    depends: [linear_model_cython_tree, utils_cython_tree],


My current understanding is that you only addded cython_shared_module to the dependencies of the custom targets doing pyx.tp -> pyx.

Do you know why this is needed only there and not in all py.extension_module?

I am not sure why exactly, but locally not adding cython_shared_module to dependencies seems to work fine. @thomasjpfan do you confirm?

It seems like when you do cython --shared=sklearn._cyutility you don't need sklearn._cyutility to exist at build time, but probably at import time.

Same for me, not adding cython_shared_module to dependencies works on CI and locally.

@matusvalo With cython --shared=sklearn._cyutility is it specify that module for import time and not build time?

It seems like when you do cython --shared=sklearn._cyutility you don't need sklearn._cyutility to exist at build time, but probably at import time.

No, it is not build time dependency. Using --shared=sklearn._cyutility will just cause that sklearn._cyutility module is cimported during module load. Nothing more.

This reverts commit ebb970d.

This reverts commit 9a57644.

thomasjpfan

Given that sklearn._cyutility is not a build dependency, it does not need to be passed around as depends or dependencies

I tested this on wheel CI with Cython==3.1.0b1 here: https://github.com/scikit-learn/scikit-learn/actions/runs/14390353392?pr=31151

lesteve

LGTM, thanks!

I pushed a commit with [scipy-dev] just in case, since this will be the build that tests the cython >= 3.1.0 if we merge this/

I guess something to decide is whether we are happy to merge this now or wait for Cython 3.1.0 to be relased.

If we merge this now, the slight downside is that for wheels the change will be picked automatically as soon as cython 3.1.0 is released.

I am fine merging this now, the scipy-dev build will test this in case there are unexpected issues.

One way to know whether cython >= 3.1.0 was used to build scikit-learn is to do python -c 'import sklearn._cyutility'. If that works, cython >= 3.1.0 was used.

jeremiedbb · 2025-04-14T17:11:39Z

Given that we are really close to a release, I'd rather not merge it right now, but wait for an actual cython release instead.

thomasjpfan added 2 commits April 5, 2025 12:16

Use Cython's shared memoryview utility between extension modules

7403b4d

Trigger [cd build]

509eebe

thomasjpfan commented Apr 5, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

thomasjpfan marked this pull request as draft April 5, 2025 16:21

Add freethreading_compatible=True [cd build]

69437fb

lesteve mentioned this pull request Apr 9, 2025

EFF Reduce the size of shared objects of the C-extensions generated by Cython #27767

Open

lesteve reviewed Apr 9, 2025

View reviewed changes

thomasjpfan added 6 commits April 9, 2025 13:54

Remove tag

8c137d4

Merge remote-tracking branch 'upstream/main' into thomasjpfan/cython_…

ecf65a5

…memory_view_pr

CI [cd build]

e9949b4

Merge remote-tracking branch 'upstream/main' into thomasjpfan/cython_…

611bc4d

…memory_view_pr

Revert pyproject.toml

32e6c1e

Merge remote-tracking branch 'refs/remotes/origin/thomasjpfan/cython_…

4bd9074

…memory_view_pr' into thomasjpfan/cython_memory_view_pr

thomasjpfan marked this pull request as ready for review April 10, 2025 01:36

lesteve reviewed Apr 10, 2025

View reviewed changes

thomasjpfan added 8 commits April 10, 2025 08:57

Add dependencies everywhere

9a57644

Add more dependencies

ebb970d

Revert "Add more dependencies"

73aa0c3

This reverts commit ebb970d.

Revert "Add dependencies everywhere"

f62b679

This reverts commit 9a57644.

Remove cython_shared_module dependency

d42d0ff

CI Trigger build [cd build]

d764322

Less diff [cd build]

60f2d69

Less diff [cd build]

e5d463c

thomasjpfan marked this pull request as draft April 10, 2025 21:09

thomasjpfan marked this pull request as ready for review April 11, 2025 01:14

Less diff

50479c4

thomasjpfan commented Apr 11, 2025

View reviewed changes

lesteve added 2 commits April 14, 2025 14:32

[scipy-dev]

e5a6478

[scipy-dev] [azure parallel]

98ffef6

lesteve changed the title ~~Use Cython's shared memoryview utility between extension modules~~ BLS Use Cython's shared memoryview utility to reduce wheel size Apr 14, 2025

lesteve changed the title ~~BLS Use Cython's shared memoryview utility to reduce wheel size~~ BLD Use Cython's shared memoryview utility to reduce wheel size Apr 14, 2025

lesteve approved these changes Apr 14, 2025

View reviewed changes

Remove freethreading config

31784f4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BLD Use Cython's shared memoryview utility to reduce wheel size #31151

BLD Use Cython's shared memoryview utility to reduce wheel size #31151

thomasjpfan commented Apr 5, 2025 •

edited

Loading

github-actions bot commented Apr 5, 2025 •

edited

Loading

thomasjpfan commented Apr 5, 2025 •

edited

Loading

jeremiedbb commented Apr 8, 2025

lesteve commented Apr 9, 2025 •

edited

Loading

lesteve Apr 9, 2025

agriyakhetarpal Apr 9, 2025

lesteve Apr 10, 2025 •

edited

Loading

lesteve Apr 10, 2025

thomasjpfan Apr 10, 2025

matusvalo Apr 10, 2025

thomasjpfan left a comment •

edited

Loading

lesteve left a comment •

edited

Loading

jeremiedbb commented Apr 14, 2025

BLD Use Cython's shared memoryview utility to reduce wheel size #31151

Are you sure you want to change the base?

BLD Use Cython's shared memoryview utility to reduce wheel size #31151

Conversation

thomasjpfan commented Apr 5, 2025 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Apr 5, 2025 • edited Loading

❌ Linting issues

ruff format

thomasjpfan commented Apr 5, 2025 • edited Loading

jeremiedbb commented Apr 8, 2025

lesteve commented Apr 9, 2025 • edited Loading

lesteve Apr 9, 2025

Choose a reason for hiding this comment

agriyakhetarpal Apr 9, 2025

Choose a reason for hiding this comment

lesteve Apr 10, 2025 • edited Loading

Choose a reason for hiding this comment

lesteve Apr 10, 2025

Choose a reason for hiding this comment

thomasjpfan Apr 10, 2025

Choose a reason for hiding this comment

matusvalo Apr 10, 2025

Choose a reason for hiding this comment

thomasjpfan left a comment • edited Loading

Choose a reason for hiding this comment

lesteve left a comment • edited Loading

Choose a reason for hiding this comment

jeremiedbb commented Apr 14, 2025

thomasjpfan commented Apr 5, 2025 •

edited

Loading

github-actions bot commented Apr 5, 2025 •

edited

Loading

`ruff format`

thomasjpfan commented Apr 5, 2025 •

edited

Loading

lesteve commented Apr 9, 2025 •

edited

Loading

lesteve Apr 10, 2025 •

edited

Loading

thomasjpfan left a comment •

edited

Loading

lesteve left a comment •

edited

Loading