From a3a8c549b9d3d547f607662b87f588c94aa3757f Mon Sep 17 00:00:00 2001 From: Guillaume Lemaitre Date: Mon, 11 Dec 2023 14:48:01 +0100 Subject: [PATCH 1/3] MAINT order changelog and fix some entries --- doc/whats_new/v1.4.rst | 336 +++++++++++++++++++++-------------------- 1 file changed, 172 insertions(+), 164 deletions(-) diff --git a/doc/whats_new/v1.4.rst b/doc/whats_new/v1.4.rst index 7ccfb22775fb9..a850f50ca86f0 100644 --- a/doc/whats_new/v1.4.rst +++ b/doc/whats_new/v1.4.rst @@ -24,24 +24,21 @@ random sampling procedures. solvers `"lbfgs"` and `"newton-cg"`. Both solvers can now reach much higher precision for the coefficients depending on the specified `tol`. Additionally, lbfgs can make better use of `tol`, i.e., stop sooner or reach higher precision. + Note: The lbfgs is the default solver, so this change might effect many models. + This change also means that with this new version of scikit-learn, the resulting + coefficients `coef_` and `intercept_` of your models will change for these two + solvers (when fit on the same data again). The amount of change depends on the + specified `tol`, for small values you will get more precise results. :pr:`26721` by :user:`Christian Lorentzen `. - .. note:: - - The lbfgs is the default solver, so this change might effect many models. - - This change also means that with this new version of scikit-learn, the resulting - coefficients `coef_` and `intercept_` of your models will change for these two - solvers (when fit on the same data again). The amount of change depends on the - specified `tol`, for small values you will get more precise results. - - |Fix| fixes a memory leak seen in PyPy for estimators using the Cython loss functions. :pr:`27670` by :user:`Guillaume Lemaitre `. Changes impacting all modules ----------------------------- -- |MajorFeature| Transformers now support polars output with `set_output(transform="polars")`. +- |MajorFeature| Transformers now support polars output with + `set_output(transform="polars")`. :pr:`27315` by `Thomas Fan`_. - |Enhancement| All estimators now recognizes the column names from any dataframe @@ -77,12 +74,12 @@ more details. :class:`multiclass.OneVsOneClassifier` and :class:`multiclass.OutputCodeClassifier` now support metadata routing in their ``fit`` and ``partial_fit``, and route metadata to the underlying - estimator's ``fit`` and ``partial_fit``. :pr:`27308` by :user:`Stefanie - Senger `. + estimator's ``fit`` and ``partial_fit``. + :pr:`27308` by :user:`Stefanie Senger `. - |Feature| :class:`pipeline.Pipeline` now supports metadata routing according - to :ref:`metadata routing user guide `. :pr:`26789` by - `Adrin Jalali`_. + to :ref:`metadata routing user guide `. + :pr:`26789` by `Adrin Jalali`_. - |Feature| :func:`~model_selection.cross_validate`, :func:`~model_selection.cross_val_score`, and @@ -91,20 +88,20 @@ more details. splitter's `split`. The metadata is accepted via the new `params` parameter. `fit_params` is deprecated and will be removed in version 1.6. `groups` parameter is also not accepted as a separate argument when metadata routing - is enabled and should be passed via the `params` parameter. :pr:`26896` by - `Adrin Jalali`_. + is enabled and should be passed via the `params` parameter. + :pr:`26896` by `Adrin Jalali`_. - |Feature| :class:`~model_selection.GridSearchCV`, :class:`~model_selection.RandomizedSearchCV`, :class:`~model_selection.HalvingGridSearchCV`, and :class:`~model_selection.HalvingRandomSearchCV` now support metadata routing in their ``fit`` and ``score``, and route metadata to the underlying - estimator's ``fit``, the CV splitter, and the scorer. :pr:`27058` by `Adrin - Jalali`_. + estimator's ``fit``, the CV splitter, and the scorer. + :pr:`27058` by `Adrin Jalali`_. - |Feature| :class:`~compose.ColumnTransformer` now supports metadata routing - according to :ref:`metadata routing user guide `. :pr:`27005` - by `Adrin Jalali`_. + according to :ref:`metadata routing user guide `. + :pr:`27005` by `Adrin Jalali`_. - |Feature| :class:`linear_model.LogisticRegressionCV` now supports metadata routing. :meth:`linear_model.LogisticRegressionCV.fit` now @@ -119,19 +116,20 @@ more details. - |Feature| :class:`linear_model.OrthogonalMatchingPursuitCV` now supports metadata routing. Its `fit` now accepts ``**fit_params``, which are passed to - the underlying splitter. :pr:`27500` by :user:`Stefanie Senger - `. - -- |Fix| All meta-estimators for which metadata routing is not yet implemented - now raise a `NotImplementedError` on `get_metadata_routing` and on `fit` if - metadata routing is enabled and any metadata is passed to them. :pr:`27389` - by `Adrin Jalali`_. + the underlying splitter. + :pr:`27500` by :user:`Stefanie Senger `. - |Feature| :class:`ElasticNetCV`, :class:`LassoCV`, :class:`MultiTaskElasticNetCV` and :class:`MultiTaskLassoCV` now support metadata routing and route metadata to the CV splitter. :pr:`27478` by :user:`Omar Salman `. +- |Fix| All meta-estimators for which metadata routing is not yet implemented + now raise a `NotImplementedError` on `get_metadata_routing` and on `fit` if + metadata routing is enabled and any metadata is passed to them. + :pr:`27389` by `Adrin Jalali`_. + + Support for SciPy sparse arrays ------------------------------- @@ -141,7 +139,8 @@ and classes are impacted: **Functions:** - :func:`cluster.compute_optics_graph` in :pr:`27104` by - :user:`Maren Westermann ` and in :pr:`27250` by :user:`Yao Xiao `; + :user:`Maren Westermann ` and in :pr:`27250` by + :user:`Yao Xiao `; - :func:`cluster.kmeans_plusplus` in :pr:`27179` by :user:`Nurseit Kamchyev `; - :func:`decomposition.non_negative_factorization` in :pr:`27100` by :user:`Isaac Virshup `; @@ -156,7 +155,7 @@ and classes are impacted: :user:`Yao Xiao `; - :func:`metrics.pairwise.pairwise_kernels` in :pr:`27250` by :user:`Yao Xiao `; -- :func:`sklearn.utils.multiclass.type_of_target` in :pr:`27274` by +- :func:`utils.multiclass.type_of_target` in :pr:`27274` by :user:`Yao Xiao `. **Classes:** @@ -165,13 +164,16 @@ and classes are impacted: - :class:`cluster.KMeans` in :pr:`27179` by :user:`Nurseit Kamchyev `; - :class:`cluster.MiniBatchKMeans` in :pr:`27179` by :user:`Nurseit Kamchyev `; - :class:`cluster.OPTICS` in :pr:`27104` by - :user:`Maren Westermann ` and in :pr:`27250` by :user:`Yao Xiao `; -- :class:`decomposition.NMF` in :pr:`27100` by :user:`Isaac Virshup `; + :user:`Maren Westermann ` and in :pr:`27250` by + :user:`Yao Xiao `; +- :class:`cluster.SpectralClustering` in :pr:`27161` by + :user:`Bharat Raghunathan `; - :class:`decomposition.MiniBatchNMF` in :pr:`27100` by :user:`Isaac Virshup `; +- :class:`decomposition.NMF` in :pr:`27100` by :user:`Isaac Virshup `; - :class:`feature_extraction.text.TfidfTransformer` in :pr:`27219` by :user:`Yao Xiao `; -- :class:`cluster.Isomap` in :pr:`27250` by :user:`Yao Xiao `; +- :class:`manifold.Isomap` in :pr:`27250` by :user:`Yao Xiao `; - :class:`manifold.SpectralEmbedding` in :pr:`27240` by :user:`Yao Xiao `; - :class:`manifold.TSNE` in :pr:`27250` by :user:`Yao Xiao `; - :class:`impute.SimpleImputer` in :pr:`27277` by :user:`Yao Xiao `; @@ -182,13 +184,42 @@ and classes are impacted: - :class:`neural_network.BernoulliRBM` in :pr:`27252` by :user:`Yao Xiao `; - :class:`preprocessing.PolynomialFeatures` in :pr:`27166` by - :user:`Mohit Joshi `. -- :class:`cluster.SpectralClustering` in :pr:`27161` by :user:`Bharat Raghunathan `; + :user:`Mohit Joshi `; - :class:`random_projection.GaussianRandomProjection` in :pr:`27314` by - :user:`Stefanie Senger `. -- :class:`random_projection.SparseRandomProjection`in :pr:`27314` by + :user:`Stefanie Senger `; +- :class:`random_projection.SparseRandomProjection` in :pr:`27314` by :user:`Stefanie Senger `. +Support for Array API +--------------------- + +Several estimators and functions support the +`Array API `_. Such changes allows for using +the estimators and functions with other libraries such as JAX, CuPy, and PyTorch. +It therefore enable for some GPU-accelerated computations. + +See :ref:`array_api` for more details. + +**Functions:** + +- :func:`sklearn.metrics.accuracy_score` and :func:`sklearn.metrics.zero_one_loss` in + :pr:`27137` by :user:`Edoardo Abati `; +- :func:`sklearn.model_selection.train_test_split` in :pr:`26855` by `Tim Head`_; +- :func:`~utils.multiclass.is_multilabel` in :pr:`27601` by + :user:`Yaroslav Korobko `. + +**Classes:** + +- :class:`decomposition.PCA` for the `full` and `randomized` solvers (with QR power + iterations) in :pr:`26315`, :pr:`27098` and :pr:`27431` by + :user:`Mateusz Sokół `, :user:`Olivier Grisel ` and + :user:`Edoardo Abati `; +- :class:`preprocessing.KernelCenterer` in :pr:`27556` by + :user:`Edoardo Abati `; +- :class:`preprocessing.MaxAbsScaler` in :pr:`27110` by :user:`Edoardo Abati `; +- :class:`preprocessing.MinMaxScaler` in :pr:`26243` by `Tim Head`_; +- :class:`preprocessing.Normalizer` in :pr:`27558` by :user:`Edoardo Abati `. + Changelog --------- @@ -209,29 +240,31 @@ Changelog - |Enhancement| :meth:`base.ClusterMixin.fit_predict` and :meth:`base.OutlierMixin.fit_predict` now accept ``**kwargs`` which are - passed to the ``fit`` method of the estimator. :pr:`26506` by `Adrin - Jalali`_. + passed to the ``fit`` method of the estimator. + :pr:`26506` by `Adrin Jalali`_. - |Enhancement| :meth:`base.TransformerMixin.fit_transform` and :meth:`base.OutlierMixin.fit_predict` now raise a warning if ``transform`` / ``predict`` consume metadata, but no custom ``fit_transform`` / ``fit_predict`` - is defined in the class inheriting from them correspondingly. :pr:`26831` by - `Adrin Jalali`_. + is defined in the class inheriting from them correspondingly. + :pr:`26831` by `Adrin Jalali`_. - |Enhancement| :func:`base.clone` now supports `dict` as input and creates a - copy. :pr:`26786` by `Adrin Jalali`_. + copy. + :pr:`26786` by `Adrin Jalali`_. - |API|:func:`~utils.metadata_routing.process_routing` now has a different signature. The first two (the object and the method) are positional only, - and all metadata are passed as keyword arguments. :pr:`26909` by `Adrin - Jalali`_. + and all metadata are passed as keyword arguments. + :pr:`26909` by `Adrin Jalali`_. :mod:`sklearn.calibration` .......................... - |Enhancement| The internal objective and gradient of the `sigmoid` method of :class:`calibration.CalibratedClassifierCV` have been replaced by the - private loss module. :pr:`27185` by :user:`Omar Salman `. + private loss module. + :pr:`27185` by :user:`Omar Salman `. :mod:`sklearn.cluster` ...................... @@ -239,14 +272,8 @@ Changelog - |Fix| The `degree` parameter in the :class:`cluster.SpectralClustering` constructor now accepts real values instead of only integral values in accordance with the `degree` parameter of the - :class:`sklearn.metrics.pairwise.polynomial_kernel`. :pr:`27668` by - :user:`Nolan McMahon `. - -- |API| `kdtree` and `balltree` values are now deprecated and are renamed as - `kd_tree` and `ball_tree` respectively for the `algorithm` parameter of - :class:`cluster.HDBSCAN` ensuring consistency in naming convention. - `kdtree` and `balltree` values will be removed in 1.6. - :pr:`26744` by :user:`Shreesha Kumar Bhat `. + :class:`sklearn.metrics.pairwise.polynomial_kernel`. + :pr:`27668` by :user:`Nolan McMahon `. - |Fix| Fixes a bug in :class:`cluster.OPTICS` where the cluster correction based on predecessor was not using the right indexing. It would lead to inconsistent results @@ -259,37 +286,45 @@ Changelog :pr:`27678` by :user:`Ganesh Tata `. - |Fix| Create copy of precomputed sparse matrix within the - `fit` method of `cluster.DBSCAN` to avoid in-place modification of + `fit` method of :class:`cluster.DBSCAN` to avoid in-place modification of the sparse matrix. :pr:`27651` by :user:`Ganesh Tata `. +- |Fix| Raises a proper `ValueError` when `metric="precomputed"` and requested storing + centers via the parameter `store_centers`. + :pr:`27898` by :user:`Guillaume Lemaitre `. + +- |API| `kdtree` and `balltree` values are now deprecated and are renamed as + `kd_tree` and `ball_tree` respectively for the `algorithm` parameter of + :class:`cluster.HDBSCAN` ensuring consistency in naming convention. + `kdtree` and `balltree` values will be removed in 1.6. + :pr:`26744` by :user:`Shreesha Kumar Bhat `. + - |API| The option `metric=None` in - :class:`cluster.AggomerativeClustering` and :class:`cluster.FeatureAgglomeration` + :class:`cluster.AgglomerativeClustering` and :class:`cluster.FeatureAgglomeration` is deprecated in version 1.4 and will be removed in version 1.6. Use the default value instead. :pr:`27828` by :user:`Guillaume Lemaitre `. -- |Fix| Raises a proper `ValueError` when `metric="precomputed"` and requested storing - centers via the parameter `store_centers`. - :pr:`27898` by :user:`Guillaume Lemaitre `. - :mod:`sklearn.compose` ...................... - |MajorFeature| Adds `polars `__ input support to :class:`compose.ColumnTransformer` through the `DataFrame Interchange Protocol `__. - The minimum supported version for polars is `0.19.12`. :pr:`26683` by `Thomas Fan`_. - -- |API| |FIX| :class:`~compose.ColumnTransformer` now replaces `"passthrough"` - with a corresponding :class:`~preprocessing.FunctionTransformer` in the - fitted ``transformers_`` attribute. :pr:`27204` by `Adrin Jalali`_. + The minimum supported version for polars is `0.19.12`. + :pr:`26683` by `Thomas Fan`_. - |Fix| :func:`cluster.spectral_clustering` and :class:`cluster.SpectralClustering` now raise an explicit error message indicating that sparse matrices and arrays with `np.int64` indices are not supported. :pr:`27240` by :user:`Yao Xiao `. +- |API| |FIX| :class:`~compose.ColumnTransformer` now replaces `"passthrough"` + with a corresponding :class:`~preprocessing.FunctionTransformer` in the + fitted ``transformers_`` attribute. + :pr:`27204` by `Adrin Jalali`_. + :mod:`sklearn.datasets` ....................... @@ -308,23 +343,19 @@ Changelog - |Enhancement| An "auto" option was added to the `n_components` parameter of :func:`decomposition.non_negative_factorization`, :class:`decomposition.NMF` and - :class:`decomposition.MiniBatchNMF` to automatically infer the number of components from W or H shapes - when using a custom initialization. The default value of this parameter will change - from `None` to `auto` in version 1.6. + :class:`decomposition.MiniBatchNMF` to automatically infer the number of components + from W or H shapes when using a custom initialization. The default value of this + parameter will change from `None` to `auto` in version 1.6. :pr:`26634` by :user:`Alexandre Landeau ` and :user:`Alexandre Vigny `. -- |Enhancement| :class:`decomposition.PCA` now supports the Array API for the - `full` and `randomized` solvers (with QR power iterations). See - :ref:`array_api` for more details. - :pr:`26315`, :pr:`27098` and :pr:`27431` by :user:`Mateusz Sokół `, - :user:`Olivier Grisel ` and :user:`Edoardo Abati `. - - |Feature| :class:`decomposition.PCA` now supports :class:`scipy.sparse.sparray` and :class:`scipy.sparse.spmatrix` inputs when using the `arpack` solver. When used on sparse data like :func:`datasets.fetch_20newsgroups_vectorized` this can lead to speed-ups of 100x (single threaded) and 70x lower memory usage. - Based on :user:`Alexander Tarashansky `'s implementation in `scanpy `. - :pr:`18689` by :user:`Isaac Virshup ` and :user:`Andrey Portnoy `. + Based on :user:`Alexander Tarashansky `'s implementation in + `scanpy `_. + :pr:`18689` by :user:`Isaac Virshup ` and + :user:`Andrey Portnoy `. - |Fix| :func:`decomposition.dict_learning_online` does not ignore anymore the parameter `max_iter`. @@ -333,8 +364,8 @@ Changelog - |Fix| The `degree` parameter in the :class:`decomposition.KernelPCA` constructor now accepts real values instead of only integral values in accordance with the `degree` parameter of the - :class:`sklearn.metrics.pairwise.polynomial_kernel`. :pr:`27668` by - :user:`Nolan McMahon `. + :class:`sklearn.metrics.pairwise.polynomial_kernel`. + :pr:`27668` by :user:`Nolan McMahon `. - |API| The option `max_iter=None` in :class:`decomposition.MiniBatchDictionaryLearning`, @@ -350,7 +381,8 @@ Changelog :class:`ensemble.RandomForestRegressor` support missing values when the criterion is `gini`, `entropy`, or `log_loss`, for classification or `squared_error`, `friedman_mse`, or `poisson` - for regression. :pr:`26391` by `Thomas Fan`_. + for regression. + :pr:`26391` by `Thomas Fan`_. - |MajorFeature| :class:`ensemble.HistGradientBoostingClassifier` and :class:`ensemble.HistGradientBoostingRegressor` supports @@ -360,8 +392,8 @@ Changelog Categorical features no longer need to be encoded with numbers. When categorical features are numbers, the maximum value no longer needs to be smaller than `max_bins`; only the number of (unique) categories must be - smaller than `max_bins`. :pr:`26411` by `Thomas Fan`_ and :pr:`27835` by - :user:`Jérôme Dockès `. + smaller than `max_bins`. + :pr:`26411` by `Thomas Fan`_ and :pr:`27835` by :user:`Jérôme Dockès `. - |MajorFeature| :class:`ensemble.HistGradientBoostingClassifier` and :class:`ensemble.HistGradientBoostingRegressor` got the new parameter @@ -396,26 +428,24 @@ Changelog - |Efficiency| :class:`ensemble.HistGradientBoostingClassifier` and :class:`ensemble.HistGradientBoostingRegressor` is now faster when `scoring` is a predefined metric listed in :func:`metrics.get_scorer_names` and - early stopping is enabled. :pr:`26163` by `Thomas Fan`_. + early stopping is enabled. + :pr:`26163` by `Thomas Fan`_. -- |Fix| Fixes :class:`ensemble.IsolationForest` when the input is a sparse matrix and - `contamination` is set to a float value. - :pr:`27645` by :user:`Guillaume Lemaitre `. - -- |API| In :class:`ensemble.AdaBoostClassifier`, the `algorithm` argument `SAMME.R` was - deprecated and will be removed in 1.6. :pr:`26830` by :user:`Stefanie Senger - `. - -- |Enhancement| A fitted property, ``estimators_samples_``, was added to all Forest methods, - including +- |Enhancement| A fitted property, ``estimators_samples_``, was added to all Forest + methods, including :class:`ensemble.RandomForestClassifier`, :class:`ensemble.RandomForestRegressor`, :class:`ensemble.ExtraTreesClassifier` and :class:`ensemble.ExtraTreesRegressor`, which allows to retrieve the training sample indices used for each tree estimator. :pr:`26736` by :user:`Adam Li `. +- |Fix| Fixes :class:`ensemble.IsolationForest` when the input is a sparse matrix and + `contamination` is set to a float value. + :pr:`27645` by :user:`Guillaume Lemaitre `. + - |Fix| Raises a `ValueError` in :class:`ensemble.RandomForestRegressor` and :class:`ensemble.ExtraTreesRegressor` when requesting OOB score with multioutput model - for the targets being all rounded to integer. It was recognized as a multiclass problem. + for the targets being all rounded to integer. It was recognized as a multiclass + problem. :pr:`27817` by :user:`Daniele Ongari ` - |Fix| Changes estimator tags to acknowledge that @@ -424,6 +454,10 @@ Changelog support missing values if all `estimators` support missing values. :pr:`27710` by :user:`Guillaume Lemaitre `. +- |API| In :class:`ensemble.AdaBoostClassifier`, the `algorithm` argument `SAMME.R` was + deprecated and will be removed in 1.6. + :pr:`26830` by :user:`Stefanie Senger `. + :mod:`sklearn.feature_extraction` ................................. @@ -442,17 +476,17 @@ Changelog :class:`feature_selection.SelectPercentile`, and :class:`feature_selection.GenericUnivariateSelect` now support unsupervised feature selection by providing a `score_func` taking `X` and `y=None`. - :pr:`27721` by :user:`Guillaume Lemaitre .` - -- |Fix| :class:`feature_selection.RFE` and :class:`feature_selection.RFECV` do - not check for nans during input validation. - :pr:`21807` by `Thomas Fan`_. + :pr:`27721` by :user:`Guillaume Lemaitre `. - |Enhancement| :class:`feature_selection.SelectKBest` and :class:`feature_selection.GenericUnivariateSelect` with `mode='k_best'` now shows a warning when `k` is greater than the number of features. :pr:`27841` by `Thomas Fan`_. +- |Fix| :class:`feature_selection.RFE` and :class:`feature_selection.RFECV` do + not check for nans during input validation. + :pr:`21807` by `Thomas Fan`_. + :mod:`sklearn.inspection` ......................... @@ -473,9 +507,8 @@ Changelog - |Fix| The `degree` parameter in the :class:`kernel_ridge.KernelRidge` constructor now accepts real values instead of only integral values in accordance with the `degree` parameter of the - :class:`sklearn.metrics.pairwise.polynomial_kernel`. :pr:`27668` by - :user:`Nolan McMahon `. - + :class:`sklearn.metrics.pairwise.polynomial_kernel`. + :pr:`27668` by :user:`Nolan McMahon `. :mod:`sklearn.linear_model` ........................... @@ -489,13 +522,6 @@ Changelog sample losses instead of sum of per sample losses. :pr:`26721` by :user:`Christian Lorentzen `. - .. note:: - - This change also means that with this new version of scikit-learn, the resulting - coefficients `coef_` and `intercept_` of your models will change for these two - solvers (when fit on the same data again). The amount of change depends on the - specified `tol`, for small values you will get more precise results. - - |Efficiency| :class:`linear_model.LogisticRegression` and :class:`linear_model.LogisticRegressionCV` with solver `"newton-cg"` can now be considerably faster for some data and parameter settings. This is accomplished by a @@ -511,25 +537,20 @@ Changelog - |Fix| Ensure that the `sigma_` attribute of :class:`linear_model.ARDRegression` and :class:`linear_model.BayesianRidge` always has a `float32` dtype when fitted on `float32` data, even with the - type promotion rules of numpy 2. + type promotion rules of NumPy 2. :pr:`27899` by :user:`Olivier Grisel `. :mod:`sklearn.metrics` ...................... -- |Fix| computing pairwise distances with :func:`euclidean_distances` no longer - raises an exception when `X` is provided as a `float64` array and - `X_norm_squared` as a `float32` array. :pr:`27624` by - :user:`Jérôme Dockès `. - - |Efficiency| Computing pairwise distances via :class:`metrics.DistanceMetric` - for CSR × CSR, Dense × CSR, and CSR × Dense datasets is now 1.5x faster. - :pr:`26765` by :user:`Meekail Zain ` + for CSR x CSR, Dense x CSR, and CSR x Dense datasets is now 1.5x faster. + :pr:`26765` by :user:`Meekail Zain `. - |Efficiency| Computing distances via :class:`metrics.DistanceMetric` - for CSR × CSR, Dense × CSR, and CSR × Dense now uses ~50% less memory, + for CSR x CSR, Dense x CSR, and CSR x Dense now uses ~50% less memory, and outputs distances in the same dtype as the provided data. - :pr:`27006` by :user:`Meekail Zain ` + :pr:`27006` by :user:`Meekail Zain `. - |Enhancement| Improve the rendering of the plot obtained with the :class:`metrics.PrecisionRecallDisplay` and :class:`metrics.RocCurveDisplay` @@ -540,9 +561,14 @@ Changelog - |Enhancement| Added `neg_root_mean_squared_log_error_scorer` as scorer :pr:`26734` by :user:`Alejandro Martin Gil <101AlexMartin>`. -- |Enhancement| :func:`sklearn.metrics.accuracy_score` and - :func:`sklearn.metrics.zero_one_loss` now support Array API compatible inputs. - :pr:`27137` by :user:`Edoardo Abati `. +- |Enhancement| :func:`metrics.confusion_matrix` now warns when only one label was + found in `y_true` and `y_pred`. + :pr:`27650` by :user:`Lucy Liu `. + +- |Fix| computing pairwise distances with :func:`metrics.pairwise.euclidean_distances` + no longer raises an exception when `X` is provided as a `float64` array and + `X_norm_squared` as a `float32` array. + :pr:`27624` by :user:`Jérôme Dockès `. - |Fix| :func:`f1_score` now provides correct values when handling various cases in which division by zero occurs by using a formulation that does not @@ -550,6 +576,11 @@ Changelog :pr:`27577` by :user:`Omar Salman ` and :user:`Guillaume Lemaitre `. +- |Fix| :func:`metrics.make_scorer` now raises an error when using a regressor on a + scorer requesting a non-thresholded decision function (from `decision_function` or + `predict_proba`). Such scorer are specific to classification. + :pr:`26840` by :user:`Guillaume Lemaitre `. + - |API| Deprecated `needs_threshold` and `needs_proba` from :func:`metrics.make_scorer`. These parameters will be removed in version 1.6. Instead, use `response_method` that accepts `"predict"`, `"predict_proba"` or `"decision_function"` or a list of such @@ -564,20 +595,9 @@ Changelog :func:`metrics.root_mean_squared_log_error` instead. :pr:`26734` by :user:`Alejandro Martin Gil <101AlexMartin>`. -- |Fix| :func:`metrics.make_scorer` now raises an error when using a regressor on a - scorer requesting a non-thresholded decision function (from `decision_function` or - `predict_proba`). Such scorer are specific to classification. - :pr:`26840` by :user:`Guillaume Lemaitre `. - -- |Enhancement| :func:`metrics.confusion_matrix` now warns when only one label was - found in `y_true` and `y_pred`. :pr:`27650` by :user:`Lucy Liu `. - :mod:`sklearn.model_selection` .............................. -- |Enhancement| :func:`sklearn.model_selection.train_test_split` now supports - Array API compatible inputs. :pr:`26855` by `Tim Head`_. - - |Enhancement| :func:`model_selection.learning_curve` raises a warning when every cross validation fold fails. :pr:`26299` by :user:`Rahil Parikh `. @@ -585,8 +605,8 @@ Changelog - |Fix| :class:`model_selection.GridSearchCV`, :class:`model_selection.RandomizedSearchCV`, and :class:`model_selection.HalvingGridSearchCV` now don't change the given - object in the parameter grid if it's an estimator. :pr:`26786` by `Adrin - Jalali`_. + object in the parameter grid if it's an estimator. + :pr:`26786` by `Adrin Jalali`_. :mod:`sklearn.multioutput` .......................... @@ -602,35 +622,24 @@ Changelog pairs of dense and sparse datasets. :pr:`27018` by :user:`Julien Jerphanion `. -- |API| :class:`neighbors.KNeighborsRegressor` now accepts - :class:`metrics.DistanceMetric` objects directly via the `metric` keyword - argument allowing for the use of accelerated third-party - :class:`metrics.DistanceMetric` objects. - :pr:`26267` by :user:`Meekail Zain `. - - |Efficiency| The performance of :meth:`neighbors.RadiusNeighborsClassifier.predict` and of :meth:`neighbors.RadiusNeighborsClassifier.predict_proba` has been improved when `radius` is large and `algorithm="brute"` with non-Euclidean metrics. :pr:`26828` by :user:`Omar Salman `. - |Fix| Improve error message for :class:`neighbors.LocalOutlierFactor` - when it is invoked with `n_samples = n_neighbors`. + when it is invoked with `n_samples=n_neighbors`. :pr:`23317` by :user:`Bharat Raghunathan `. +- |API| :class:`neighbors.KNeighborsRegressor` now accepts + :class:`metrics.DistanceMetric` objects directly via the `metric` keyword + argument allowing for the use of accelerated third-party + :class:`metrics.DistanceMetric` objects. + :pr:`26267` by :user:`Meekail Zain `. + :mod:`sklearn.preprocessing` ............................ -- |MajorFeature| The following classes now support the - `Array API `_. Array API - support is considered experimental and might evolve without being subject to - our usual rolling deprecation cycle policy. See - :ref:`array_api` for more details. - - - :class:`preprocessing.MinMaxScaler` :pr:`26243` by `Tim Head`_ - - :class:`preprocessing.MaxAbsScaler` :pr:`27110` by :user:`Edoardo Abati ` - - :class:`preprocessing.KernelCenterer` :pr:`27556` by :user:`Edoardo Abati ` - - :class:`preprocessing.Normalizer` :pr:`27558` by :user:`Edoardo Abati ` - - |Efficiency| :class:`preprocessing.OrdinalEncoder` avoids calculating missing indices twice to improve efficiency. :pr:`27017` by :user:`Xuefeng Xu `. @@ -644,11 +653,13 @@ Changelog :pr:`26944` by `Thomas Fan`_. - |Enhancement| :class:`preprocessing.TargetEncoder` now supports `target_type` - 'multiclass'. :pr:`26674` by :user:`Lucy Liu `. + 'multiclass'. + :pr:`26674` by :user:`Lucy Liu `. - |Fix| :class:`preprocessing.OneHotEncoder` and :class:`preprocessing.OrdinalEncoder` raise an exception when `nan` is a category and is not the last in the user's - provided categories. :pr:`27309` by :user:`Xuefeng Xu `. + provided categories. + :pr:`27309` by :user:`Xuefeng Xu `. - |Fix| :class:`preprocessing.OneHotEncoder` and :class:`preprocessing.OrdinalEncoder` raise an exception if the user provided categories contain duplicates. @@ -688,17 +699,6 @@ Changelog which can be used to check whether a given set of parameters would be consumed. :pr:`26831` by `Adrin Jalali`_. -- |Enhancement| Make :func:`sklearn.utils.check_array` attempt to output - `int32`-indexed CSR and COO arrays when converting from DIA arrays if the number of - non-zero entries is small enough. This ensures that estimators implemented in Cython - and that do not accept `int64`-indexed sparse datastucture, now consistently - accept the same sparse input formats for SciPy sparse matrices and arrays. - :pr:`27372` by :user:`Guillaume Lemaitre `. - -- |Enhancement| :func:`~utils.multiclass.is_multilabel` now supports the Array API - compatible inputs. - :pr:`27601` by :user:`Yaroslav Korobko `. - - |Fix| :func:`sklearn.utils.check_array` should accept both matrix and array from the sparse SciPy module. The previous implementation would fail if `copy=True` by calling specific NumPy `np.may_share_memory` that does not work with SciPy sparse @@ -718,7 +718,15 @@ Changelog - |Fix| Error message in :func:`~utils.check_array` when a sparse matrix was passed but `accept_sparse` is `False` now suggests to use `.toarray()` and not - `X.toarray()`. :pr:`27757` by :user:`Lucy Liu `. + `X.toarray()`. + :pr:`27757` by :user:`Lucy Liu `. + +- |Enhancement| Make :func:`sklearn.utils.check_array` attempt to output + `int32`-indexed CSR and COO arrays when converting from DIA arrays if the number of + non-zero entries is small enough. This ensures that estimators implemented in Cython + and that do not accept `int64`-indexed sparse datastucture, now consistently + accept the same sparse input formats for SciPy sparse matrices and arrays. + :pr:`27372` by :user:`Guillaume Lemaitre `. Code and Documentation Contributors ----------------------------------- From da6380b631a6d381ecbfa8056055a07a7841016e Mon Sep 17 00:00:00 2001 From: Guillaume Lemaitre Date: Mon, 11 Dec 2023 15:43:56 +0100 Subject: [PATCH 2/3] Update doc/whats_new/v1.4.rst --- doc/whats_new/v1.4.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/whats_new/v1.4.rst b/doc/whats_new/v1.4.rst index a850f50ca86f0..4d0b9b1bfba68 100644 --- a/doc/whats_new/v1.4.rst +++ b/doc/whats_new/v1.4.rst @@ -196,7 +196,7 @@ Support for Array API Several estimators and functions support the `Array API `_. Such changes allows for using the estimators and functions with other libraries such as JAX, CuPy, and PyTorch. -It therefore enable for some GPU-accelerated computations. +This therefore enables some GPU-accelerated computations. See :ref:`array_api` for more details. From cf216b954e853f31bd5ff747a04cf4400ef1e90b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Lo=C3=AFc=20Est=C3=A8ve?= Date: Mon, 11 Dec 2023 17:01:35 +0100 Subject: [PATCH 3/3] Reordering following tags --- doc/whats_new/v1.4.rst | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/doc/whats_new/v1.4.rst b/doc/whats_new/v1.4.rst index 4d0b9b1bfba68..adb362d2802d5 100644 --- a/doc/whats_new/v1.4.rst +++ b/doc/whats_new/v1.4.rst @@ -341,13 +341,6 @@ Changelog :mod:`sklearn.decomposition` ............................ -- |Enhancement| An "auto" option was added to the `n_components` parameter of - :func:`decomposition.non_negative_factorization`, :class:`decomposition.NMF` and - :class:`decomposition.MiniBatchNMF` to automatically infer the number of components - from W or H shapes when using a custom initialization. The default value of this - parameter will change from `None` to `auto` in version 1.6. - :pr:`26634` by :user:`Alexandre Landeau ` and :user:`Alexandre Vigny `. - - |Feature| :class:`decomposition.PCA` now supports :class:`scipy.sparse.sparray` and :class:`scipy.sparse.spmatrix` inputs when using the `arpack` solver. When used on sparse data like :func:`datasets.fetch_20newsgroups_vectorized` this @@ -357,6 +350,13 @@ Changelog :pr:`18689` by :user:`Isaac Virshup ` and :user:`Andrey Portnoy `. +- |Enhancement| An "auto" option was added to the `n_components` parameter of + :func:`decomposition.non_negative_factorization`, :class:`decomposition.NMF` and + :class:`decomposition.MiniBatchNMF` to automatically infer the number of components + from W or H shapes when using a custom initialization. The default value of this + parameter will change from `None` to `auto` in version 1.6. + :pr:`26634` by :user:`Alexandre Landeau ` and :user:`Alexandre Vigny `. + - |Fix| :func:`decomposition.dict_learning_online` does not ignore anymore the parameter `max_iter`. :pr:`27834` by :user:`Guillaume Lemaitre `. @@ -699,16 +699,19 @@ Changelog which can be used to check whether a given set of parameters would be consumed. :pr:`26831` by `Adrin Jalali`_. +- |Enhancement| Make :func:`sklearn.utils.check_array` attempt to output + `int32`-indexed CSR and COO arrays when converting from DIA arrays if the number of + non-zero entries is small enough. This ensures that estimators implemented in Cython + and that do not accept `int64`-indexed sparse datastucture, now consistently + accept the same sparse input formats for SciPy sparse matrices and arrays. + :pr:`27372` by :user:`Guillaume Lemaitre `. + - |Fix| :func:`sklearn.utils.check_array` should accept both matrix and array from the sparse SciPy module. The previous implementation would fail if `copy=True` by calling specific NumPy `np.may_share_memory` that does not work with SciPy sparse array and does not return the correct result for SciPy sparse matrix. :pr:`27336` by :user:`Guillaume Lemaitre `. -- |API| :func:`sklearn.extmath.log_logistic` is deprecated and will be removed in 1.6. - Use `-np.logaddexp(0, -x)` instead. - :pr:`27544` by :user:`Christian Lorentzen `. - - |Fix| :func:`~utils.estimator_checks.check_estimators_pickle` with `readonly_memmap=True` now relies on joblib's own capability to allocate aligned memory mapped arrays when loading a serialized estimator instead of @@ -721,12 +724,9 @@ Changelog `X.toarray()`. :pr:`27757` by :user:`Lucy Liu `. -- |Enhancement| Make :func:`sklearn.utils.check_array` attempt to output - `int32`-indexed CSR and COO arrays when converting from DIA arrays if the number of - non-zero entries is small enough. This ensures that estimators implemented in Cython - and that do not accept `int64`-indexed sparse datastucture, now consistently - accept the same sparse input formats for SciPy sparse matrices and arrays. - :pr:`27372` by :user:`Guillaume Lemaitre `. +- |API| :func:`sklearn.extmath.log_logistic` is deprecated and will be removed in 1.6. + Use `-np.logaddexp(0, -x)` instead. + :pr:`27544` by :user:`Christian Lorentzen `. Code and Documentation Contributors -----------------------------------