diff --git a/doc/whats_new/v1.3.rst b/doc/whats_new/v1.3.rst index 51bbef681bd91..d865313f86cd8 100644 --- a/doc/whats_new/v1.3.rst +++ b/doc/whats_new/v1.3.rst @@ -29,11 +29,6 @@ random sampling procedures. `transform_algorithm` is not the same as `fit_algorithm` and the number of iterations is small. :pr:`24871` by :user:`Omar Salman `. -- |Fix| Treat more consistently small values in the `W` and `H` matrices during the - `fit` and `transform` steps of :class:`decomposition.NMF` and - :class:`decomposition.MiniBatchNMF` which can produce different results than previous - versions. :pr:`25438` by :user:`Yotam Avidar-Constantini `. - - |Enhancement| The `sample_weight` parameter now will be used in centroids initialization for :class:`cluster.KMeans`, :class:`cluster.BisectingKMeans` and :class:`cluster.MiniBatchKMeans`. @@ -43,6 +38,11 @@ random sampling procedures. :user:`Jérémie du Boisberranger `, :user:`Guillaume Lemaitre `. +- |Fix| Treat more consistently small values in the `W` and `H` matrices during the + `fit` and `transform` steps of :class:`decomposition.NMF` and + :class:`decomposition.MiniBatchNMF` which can produce different results than previous + versions. :pr:`25438` by :user:`Yotam Avidar-Constantini `. + - |Fix| :class:`decomposition.KernelPCA` may produce different results through `inverse_transform` if `gamma` is `None`. Now it will be chosen correctly as `1/n_features` of the data that it is fitted on, while previously it might be @@ -201,13 +201,18 @@ Changelog :mod:`sklearn.cluster` ...................... -- |API| The `sample_weight` parameter in `predict` for - :meth:`cluster.KMeans.predict` and :meth:`cluster.MiniBatchKMeans.predict` - is now deprecated and will be removed in v1.5. - :pr:`25251` by :user:`Gleb Levitski `. +- |MajorFeature| Added :class:`cluster.HDBSCAN`, a modern hierarchical density-based + clustering algorithm. Similarly to :class:`cluster.OPTICS`, it can be seen as a + generalization of :class:`cluster.DBSCAN` by allowing for hierarchical instead of flat + clustering, however it varies in its approach from :class:`cluster.OPTICS`. This + algorithm is very robust with respect to its hyperparameters' values and can + be used on a wide variety of data without much, if any, tuning. -- |API| The `Xred` argument in :func:`cluster.FeatureAgglomeration.inverse_transform` - is renamed to `Xt` and will be removed in v1.5. :pr:`26503` by `Adrin Jalali`_. + This implementation is an adaptation from the original implementation of HDBSCAN in + `scikit-learn-contrib/hdbscan `_, + by :user:`Leland McInnes ` et al. + + :pr:`26385` by :user:`Meekail Zain ` - |Enhancement| The `sample_weight` parameter now will be used in centroids initialization for :class:`cluster.KMeans`, :class:`cluster.BisectingKMeans` @@ -218,26 +223,21 @@ Changelog :user:`Jérémie du Boisberranger `, :user:`Guillaume Lemaitre `. -- |MajorFeature| Added :class:`cluster.HDBSCAN`, a modern hierarchical density-based - clustering algorithm. Similarly to :class:`cluster.OPTICS`, it can be seen as a - generalization of :class:`DBSCAN` by allowing for hierarchical instead of flat - clustering, however it varies in its approach from :class:`cluster.OPTICS`. This - algorithm is very robust with respect to its hyperparameters' values and can - be used on a wide variety of data without much, if any, tuning. - - This implementation is an adaptation from the original implementation of HDBSCAN in - `scikit-learn-contrib/hdbscan `_, - by :user:`Leland McInnes ` et al. +- |API| The `sample_weight` parameter in `predict` for + :meth:`cluster.KMeans.predict` and :meth:`cluster.MiniBatchKMeans.predict` + is now deprecated and will be removed in v1.5. + :pr:`25251` by :user:`Gleb Levitski `. - :pr:`26385` by :user:`Meekail Zain ` +- |API| The `Xred` argument in :func:`cluster.FeatureAgglomeration.inverse_transform` + is renamed to `Xt` and will be removed in v1.5. :pr:`26503` by `Adrin Jalali`_. :mod:`sklearn.compose` ...................... -- |Fix| `compose.ColumnTransformer` raises an informative error when the - individual transformers of `ColumnTransformer` output pandas dataframes with - indexes that are not consistent with each other and the output is configured - to be pandas. :pr:`26286` by `Thomas Fan`_. +- |Fix| `compose.ColumnTransformer` raises an informative error when the individual + transformers of `ColumnTransformer` output pandas dataframes with indexes that are + not consistent with each other and the output is configured to be pandas. + :pr:`26286` by `Thomas Fan`_. - |Fix| :class:`compose.ColumnTransformer` correctly sets the output of the remainder when `set_output` is called. :pr:`26323` by `Thomas Fan`_. @@ -245,6 +245,14 @@ Changelog :mod:`sklearn.covariance` ......................... +- |Fix| Allows `alpha=0` in :class:`covariance.GraphicalLasso` to be + consistent with :func:`covariance.graphical_lasso`. + :pr:`26033` by :user:`Genesis Valencia `. + +- |Fix| :func:`covariance.empirical_covariance` now gives an informative + error message when input is not appropriate. + :pr:`26108` by :user:`Quentin Barthélemy `. + - |API| Deprecates `cov_init` in :func:`covariance.graphical_lasso` in 1.3 since the parameter has no effect. It will be removed in 1.5. :pr:`26033` by :user:`Genesis Valencia `. @@ -260,20 +268,13 @@ Changelog :func:`covariance.graphical_lasso_path`, and :class:`covariance.GraphicalLassoCV`. :pr:`26033` by :user:`Genesis Valencia `. -- |Fix| Allows `alpha=0` in :class:`covariance.GraphicalLasso` to be - consistent with :func:`covariance.graphical_lasso`. - :pr:`26033` by :user:`Genesis Valencia `. - -- |Fix| :func:`covariance.empirical_covariance` now gives an informative - error message when input is not appropriate. - :pr:`26108` by :user:`Quentin Barthélemy `. - :mod:`sklearn.datasets` ....................... -- |API| The `data_transposed` argument of :func:`datasets.make_sparse_coded_signal` - is deprecated and will be removed in v1.5. - :pr:`25784` by :user:`Jérémie du Boisberranger`. +- |Enhancement| Allows to overwrite the parameters used to open the ARFF file using + the parameter `read_csv_kwargs` in :func:`datasets.fetch_openml` when using the + pandas parser. + :pr:`26433` by :user:`Guillaume Lemaitre `. - |Fix| :func:`datasets.fetch_openml` returns improved data types when `as_frame=True` and `parser="liac-arff"`. :pr:`26386` by `Thomas Fan`_. @@ -287,32 +288,31 @@ Changelog with both parsers `"pandas"` and `"liac-arff"`. :pr:`26579` by :user:`Guillaume Lemaitre `. -- |Enhancement| Allows to overwrite the parameters used to open the ARFF file using - the parameter `read_csv_kwargs` in :func:`datasets.fetch_openml` when using the - pandas parser. - :pr:`26433` by :user:`Guillaume Lemaitre `. +- |API| The `data_transposed` argument of :func:`datasets.make_sparse_coded_signal` + is deprecated and will be removed in v1.5. + :pr:`25784` by :user:`Jérémie du Boisberranger`. :mod:`sklearn.decomposition` ............................ -- |Enhancement| :class:`decomposition.DictionaryLearning` now accepts the parameter - `callback` for consistency with the function :func:`decomposition.dict_learning`. - :pr:`24871` by :user:`Omar Salman `. - - |Efficiency| :class:`decomposition.MiniBatchDictionaryLearning` and :class:`decomposition.MiniBatchSparsePCA` are now faster for small batch sizes by avoiding duplicate validations. :pr:`25490` by :user:`Jérémie du Boisberranger `. -- |API| The `W` argument in :func:`decomposition.NMF.inverse_transform` and - :class:`decomposition.MiniBatchNMF.inverse_transform` is renamed to `Xt` and - will be removed in v1.5. :pr:`26503` by `Adrin Jalali`_. +- |Enhancement| :class:`decomposition.DictionaryLearning` now accepts the parameter + `callback` for consistency with the function :func:`decomposition.dict_learning`. + :pr:`24871` by :user:`Omar Salman `. - |Fix| Treat more consistently small values in the `W` and `H` matrices during the `fit` and `transform` steps of :class:`decomposition.NMF` and :class:`decomposition.MiniBatchNMF` which can produce different results than previous versions. :pr:`25438` by :user:`Yotam Avidar-Constantini `. +- |API| The `W` argument in :func:`decomposition.NMF.inverse_transform` and + :class:`decomposition.MiniBatchNMF.inverse_transform` is renamed to `Xt` and + will be removed in v1.5. :pr:`26503` by `Adrin Jalali`_. + :mod:`sklearn.discriminant_analysis` .................................... @@ -376,6 +376,7 @@ Changelog :mod:`sklearn.exception` ........................ + - |Feature| Added :class:`exception.InconsistentVersionWarning` which is raised when a scikit-learn estimator is unpickled with a scikit-learn version that is inconsistent with the sckit-learn version the estimator was pickled with. @@ -435,12 +436,6 @@ Changelog now preserve dtype for `numpy.float32`. :pr:`25587` by :user:`Omar Salman `. -- |API| Deprecates `n_iter` in favor of `max_iter` in - :class:`linear_model.BayesianRidge` and :class:`linear_model.ARDRegression`. - `n_iter` will be removed in scikit-learn 1.5. This change makes those - estimators consistent with the rest of estimators. - :pr:`25697` by :user:`John Pangas `. - - |Enhancement| The `n_iter_` attribute has been included in :class:`linear_model.ARDRegression` to expose the actual number of iterations required to reach the stopping criterion. @@ -451,6 +446,12 @@ Changelog on linearly separable problems. :pr:`25214` by `Tom Dupre la Tour`_. +- |API| Deprecates `n_iter` in favor of `max_iter` in + :class:`linear_model.BayesianRidge` and :class:`linear_model.ARDRegression`. + `n_iter` will be removed in scikit-learn 1.5. This change makes those + estimators consistent with the rest of estimators. + :pr:`25697` by :user:`John Pangas `. + :mod:`sklearn.manifold` ....................... @@ -460,33 +461,26 @@ Changelog :mod:`sklearn.metrics` ...................... -- |Efficiency| The computation of the expected mutual information in - :func:`metrics.adjusted_mutual_info_score` is now faster when the number of - unique labels is large and its memory usage is reduced in general. - :pr:`25713` by :user:`Kshitij Mathur `, - :user:`Guillaume Lemaitre `, :user:`Omar Salman ` and - :user:`Jérémie du Boisberranger `. - - |Feature| Adds `zero_division=np.nan` to multiple classification metrics: - :func:`precision_score`, :func:`recall_score`, :func:`f1_score`, - :func:`fbeta_score`, :func:`precision_recall_fscore_support`, - :func:`classification_report`. When `zero_division=np.nan` and there is a + :func:`metrics.precision_score`, :func:`metrics.recall_score`, + :func:`metrics.f1_score`, :func:`metrics.fbeta_score`, + :func:`metrics.precision_recall_fscore_support`, + :func:`metrics.classification_report`. When `zero_division=np.nan` and there is a zero division, the metric is undefined and is excluded from averaging. When not used for averages, the value returned is `np.nan`. :pr:`25531` by :user:`Marc Torrellas Socastro `. -- |Fix| :func:`metric.manhattan_distances` now supports readonly sparse datasets. - :pr:`25432` by :user:`Julien Jerphanion `. - -- |Fix| Fixed :func:`classification_report` so that empty input will return - `np.nan`. Previously, "macro avg" and `weighted avg` would return - e.g. `f1-score=np.nan` and `f1-score=0.0`, being inconsistent. Now, they - both return `np.nan`. - :pr:`25531` by :user:`Marc Torrellas Socastro `. +- |Feature| :func:`metrics.average_precision_score` now supports the + multiclass case. + :pr:`17388` by :user:`Geoffrey Bolmier ` and + :pr:`24769` by :user:`Ashwin Mathur `. -- |Fix| :func:`metric.ndcg_score` now gives a meaningful error message for input of - length 1. - :pr:`25672` by :user:`Lene Preuss ` and :user:`Wei-Chun Chu `. +- |Efficiency| The computation of the expected mutual information in + :func:`metrics.adjusted_mutual_info_score` is now faster when the number of + unique labels is large and its memory usage is reduced in general. + :pr:`25713` by :user:`Kshitij Mathur `, + :user:`Guillaume Lemaitre `, :user:`Omar Salman ` and + :user:`Jérémie du Boisberranger `. - |Enhancement| :class:`metrics.silhouette_samples` nows accepts a sparse matrix of pairwise distances between samples, or a feature array. @@ -513,17 +507,23 @@ Changelog chance level. This line is exposed in the `chance_level_` attribute. :pr:`26019` by :user:`Yao Xiao `. -- |Fix| :func:`log_loss` raises a warning if the values of the parameter `y_pred` are - not normalized, instead of actually normalizing them in the metric. Starting from - 1.5 this will raise an error. :pr:`25299` by :user:`Omar Salman `. -- |API| The `eps` parameter of the :func:`log_loss` has been deprecated and will be - removed in 1.5. :pr:`25299` by :user:`Omar Salman `. +- |Fix| Fixed :func:`metrics.classification_report` so that empty input will return + `np.nan`. Previously, "macro avg" and `weighted avg` would return + e.g. `f1-score=np.nan` and `f1-score=0.0`, being inconsistent. Now, they + both return `np.nan`. + :pr:`25531` by :user:`Marc Torrellas Socastro `. -- |Feature| :func:`metrics.average_precision_score` now supports the - multiclass case. - :pr:`17388` by :user:`Geoffrey Bolmier ` and - :pr:`24769` by :user:`Ashwin Mathur `. +- |Fix| :func:`metrics.ndcg_score` now gives a meaningful error message for input of + length 1. + :pr:`25672` by :user:`Lene Preuss ` and :user:`Wei-Chun Chu `. + +- |Fix| :func:`metrics.log_loss` raises a warning if the values of the parameter + `y_pred` are not normalized, instead of actually normalizing them in the metric. + Starting from 1.5 this will raise an error. + :pr:`25299` by :user:`Omar Salman ` +- |API| The `eps` parameter of the :func:`metrics.log_loss` has been deprecated and + will be removed in 1.5. :pr:`25299` by :user:`Omar Salman `. + :mod:`sklearn.gaussian_process` ............................... @@ -567,15 +570,15 @@ Changelog :mod:`sklearn.neighbors` ........................ -- |Fix| Remove support for `KulsinskiDistance` in :class:`neighbors.BallTree`. This - dissimilarity is not a metric and cannot be supported by the BallTree. - :pr:`25417` by :user:`Guillaume Lemaitre `. - - |Enhancement| The performance of :meth:`neighbors.KNeighborsClassifier.predict` and of :meth:`neighbors.KNeighborsClassifier.predict_proba` has been improved when `n_neighbors` is large and `algorithm="brute"` with non Euclidean metrics. :pr:`24076` by :user:`Meekail Zain `, :user:`Julien Jerphanion `. +- |Fix| Remove support for `KulsinskiDistance` in :class:`neighbors.BallTree`. This + dissimilarity is not a metric and cannot be supported by the BallTree. + :pr:`25417` by :user:`Guillaume Lemaitre `. + - |API| The support for metrics other than `euclidean` and `manhattan` and for callables in :class:`neighbors.NearestNeighbors` is deprecated and will be removed in version 1.5. :pr:`24083` by :user:`Valentin Laurent `. @@ -613,10 +616,24 @@ Changelog categorical encoding based on target mean conditioned on the value of the category. :pr:`25334` by `Thomas Fan`_. +- |Feature| :class:`preprocessing.OrdinalEncoder` now supports grouping + infrequent categories into a single feature. Grouping infrequent categories + is enabled by specifying how to select infrequent categories with + `min_frequency` or `max_categories`. :pr:`25677` by `Thomas Fan`_. + +- |Enhancement| :class:`preprocessing.PolynomialFeatures` now calculates the + number of expanded terms a-priori when dealing with sparse `csr` matrices + in order to optimize the choice of `dtype` for `indices` and `indptr`. It + can now output `csr` matrices with `np.int32` `indices/indptr` components + when there are few enough elements, and will automatically use `np.int64` + for sufficiently large matrices. + :pr:`20524` by :user:`niuk-a ` and + :pr:`23731` by :user:`Meekail Zain ` + - |Enhancement| A new parameter `sparse_output` was added to - :class:`SplineTransformer`, available as of SciPy 1.8. If `sparse_output=True`, - :class:`SplineTransformer` returns a sparse CSR matrix. - :pr:`24145` by :user:`Christian Lorentzen `. + :class:`preprocessing.SplineTransformer`, available as of SciPy 1.8. If + `sparse_output=True`, :class:`preprocessing.SplineTransformer` returns a sparse + CSR matrix. :pr:`24145` by :user:`Christian Lorentzen `. - |Enhancement| Adds a `feature_name_combiner` parameter to :class:`preprocessing.OneHotEncoder`. This specifies a custom callable to create @@ -631,19 +648,13 @@ Changelog :pr:`24935` by :user:`Seladus `, :user:`Guillaume Lemaitre `, and :user:`Dea María Léon `, :pr:`25257` by :user:`Gleb Levitski `. -- |Feature| :class:`preprocessing.OrdinalEncoder` now supports grouping - infrequent categories into a single feature. Grouping infrequent categories - is enabled by specifying how to select infrequent categories with - `min_frequency` or `max_categories`. :pr:`25677` by `Thomas Fan`_. - - |Enhancement| Subsampling through the `subsample` parameter can now be used in :class:`preprocessing.KBinsDiscretizer` regardless of the strategy used. :pr:`26424` by :user:`Jérémie du Boisberranger `. -- |API| The default value of the `subsample` parameter of - :class:`preprocessing.KBinsDiscretizer` will change from `None` to `200_000` in - version 1.5 when `strategy="kmeans"` or `strategy="uniform"`. - :pr:`26424` by :user:`Jérémie du Boisberranger `. +- |Fix| :class:`preprocessing.AdditiveChi2Sampler` is now stateless. + The `sample_interval_` attribute is deprecated and will be removed in 1.5. + :pr:`25190` by :user:`Vincent Maladière `. - |Fix| :class:`AdditiveChi2Sampler` is now stateless. The `sample_interval_` attribute is deprecated and will be removed in 1.5. @@ -661,6 +672,11 @@ Changelog the `lambdas_` fitted parameter. :pr:`26566` by :user:`Jérémie du Boisberranger `. +- |API| The default value of the `subsample` parameter of + :class:`preprocessing.KBinsDiscretizer` will change from `None` to `200_000` in + version 1.5 when `strategy="kmeans"` or `strategy="uniform"`. + :pr:`26424` by :user:`Jérémie du Boisberranger `. + :mod:`sklearn.svm` .................. @@ -689,45 +705,36 @@ Changelog :mod:`sklearn.utils` .................... -- |API| :func:`estimator_checks.check_transformers_unfitted_stateless` has been +- |FIX| Fixes :func:`utils.validation.check_array` to properly convert pandas + extension arrays. :pr:`25813` and :pr:`26106` by `Thomas Fan`_. + +- |Fix| :func:`utils.validation.check_array` now supports pandas DataFrames with + extension arrays and object dtypes by return an ndarray with object dtype. + :pr:`25814` by `Thomas Fan`_. + +- |API| :func:`utils.estimator_checks.check_transformers_unfitted_stateless` has been introduced to ensure stateless transformers don't raise `NotFittedError` during `transform` with no prior call to `fit` or `fit_transform`. :pr:`25190` by :user:`Vincent Maladière `. -- |Enhancement| :class:`preprocessing.PolynomialFeatures` now calculates the - number of expanded terms a-priori when dealing with sparse `csr` matrices - in order to optimize the choice of `dtype` for `indices` and `indptr`. It - can now output `csr` matrices with `np.int32` `indices/indptr` components - when there are few enough elements, and will automatically use `np.int64` - for sufficiently large matrices. - :pr:`20524` by :user:`niuk-a ` and - :pr:`23731` by :user:`Meekail Zain ` - - |API| A `FutureWarning` is now raised when instantiating a class which inherits from a deprecated base class (i.e. decorated by :class:`utils.deprecated`) and which overrides the `__init__` method. :pr:`25733` by :user:`Brigitta Sipőcz ` and :user:`Jérémie du Boisberranger `. -- |FIX| Fixes :func:`utils.validation.check_array` to properly convert pandas - extension arrays. :pr:`25813` and :pr:`26106` by `Thomas Fan`_. - -- |Fix| :func:`utils.validation.check_array` now supports pandas DataFrames with - extension arrays and object dtypes by return an ndarray with object dtype. - :pr:`25814` by `Thomas Fan`_. - :mod:`sklearn.semi_supervised` .............................. -- |Enhancement| :meth:`LabelSpreading.fit` and :meth:`LabelPropagation.fit` now - accepts sparse metrics. +- |Enhancement| :meth:`semi_supervised.LabelSpreading.fit` and + :meth:`semi_supervised.LabelPropagation.fit` now accepts sparse metrics. :pr:`19664` by :user:`Kaushik Amar Das `. Miscellaneous ............. -- |Enhancement| Replace obsolete exceptions EnvironmentError, IOError and - WindowsError. +- |Enhancement| Replace obsolete exceptions `EnvironmentError`, `IOError` and + `WindowsError`. :pr:`26466` by :user:`Dimitri Papadopoulos ORfanos `. Code and Documentation Contributors