Skip to content

DOC cleaning up to 0.23/whats new #17015

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 27, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/whats_new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ on libraries.io to be notified when new versions are released.
.. toctree::
:maxdepth: 1

Version 0.24 <whats_new/v0.24.rst>
Version 0.23 <whats_new/v0.23.rst>
Version 0.22 <whats_new/v0.22.rst>
Version 0.21 <whats_new/v0.21.rst>
Expand Down
225 changes: 137 additions & 88 deletions doc/whats_new/v0.23.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,44 @@ parameters, may produce different models from the previous version. This often
occurs due to changes in the modelling logic (bug fixes or enhancements), or in
random sampling procedures.

- :class:`ensemble.BaggingClassifier`, :class:`ensemble.BaggingRegressor`,
and :class:`ensemble.IsolationForest`. |Fix|

- Any model using the :func:`svm.libsvm` or the :func:`svm.liblinear` solver,
- |Fix| :class:`ensemble.BaggingClassifier`, :class:`ensemble.BaggingRegressor`,
and :class:`ensemble.IsolationForest`.
- |Fix| :class:`cluster.KMeans` with ``algorithm="elkan"`` and
``algorithm="full"``.
- |Fix| :class:`cluster.Birch`
- |Fix| :func:`compose.ColumnTransformer.get_feature_names`
- |Fix| :func:`compose.ColumnTransformer.fit`
- |Fix| :func:`datasets.make_multilabel_classification`
- |Fix| :class:`decomposition.PCA` with `n_components='mle'`
- |Enhancement| :class:`decomposition.NMF` and
:func:`decomposition.non_negative_factorization` with float32 dtype input.
- |Fix| :func:`decomposition.KernelPCA.inverse_transform`
- |API| :class:`ensemble.HistGradientBoostingClassifier` and
:class:`ensemble.HistGradientBoostingRegrerssor`
- |Fix| ``estimator_samples_`` in :class:`ensemble.BaggingClassifier`,
:class:`ensemble.BaggingRegressor` and :class:`ensemble.IsolationForest`
- |Fix| :class:`ensemble.StackingClassifier` and
:class:`ensemble.StackingRegressor` with `sample_weight`
- |Fix| :class:`gaussian_process.GaussianProcessRegressor`
- |Fix| :class:`linear_model.RANSACRegressor` with ``sample_weight``.
- |Fix| :class:`linear_model.RidgeClassifierCV`
- |Fix| :func:`metrics.mean_squared_error` with `squared` and
`multioutput='raw_values'`.
- |Fix| :func:`metrics.mutual_info_score` with negative scores.
- |Fix| :func:`metrics.confusion_matrix` with zero length `y_true` and `y_pred`
- |Fix| :class:`neural_network.MLPClassifier`
- |Fix| :class:`preprocessing.StandardScaler` with `partial_fit` and sparse
input.
- |Fix| :class:`preprocessing.Normalizer` with norm='max'
- |Fix| Any model using the :func:`svm.libsvm` or the :func:`svm.liblinear` solver,
including :class:`svm.LinearSVC`, :class:`svm.LinearSVR`,
:class:`svm.NuSVC`, :class:`svm.NuSVR`, :class:`svm.OneClassSVM`,
:class:`svm.SVC`, :class:`svm.SVR`, :class:`linear_model.LogisticRegression`.
|Efficiency| |Fix|
- |Fix| :class:`tree.DecisionTreeClassifier`, :class:`tree.ExtraTreeClassifier` and
:class:`ensemble.GradientBoostingClassifier` as well as ``predict`` method of
:class:`tree.DecisionTreeRegressor`, :class:`tree.ExtraTreeRegressor`, and
:class:`ensemble.GradientBoostingRegressor` and read-only float32 input in
``predict``, ``decision_path`` and ``predict_proba``.

Details are listed in the changelog below.

Expand All @@ -53,19 +83,29 @@ Changelog
:mod:`sklearn.cluster`
......................

- |Enhancement| :class:`cluster.AgglomerativeClustering` has a faster and more
more memory efficient implementation of single linkage clustering.
:pr:`11514` by :user:`Leland McInnes <lmcinnes>`.
- |Fix| :class:`cluster.KMeans` with ``algorithm="elkan"`` now converges with
``tol=0`` as with the default ``algorithm="full"``. :pr:`16075` by
:user:`Erich Schubert <kno10>`.

- |Efficiency| :class:`cluster.Birch` implementation of the predict method
avoids high memory footprint by calculating the distances matrix using
a chunked scheme.
:pr:`16149` by :user:`Jeremie du Boisberranger <jeremiedbb>` and
:user:`Alex Shacked <alexshacked>`.

- |Efficiency| The critical parts of :class:`cluster.KMeans` have a more
optimized implementation. Parallelism is now over the data instead of over
initializations allowing better scalability. :pr:`11950` by
:user:`Jeremie du Boisberranger <jeremiedbb>`.

- |Enhancement| :class:`cluster.KMeans` now supports sparse data when
`solver = "elkan"`. :pr:`11950` by
:user:`Jeremie du Boisberranger <jeremiedbb>`.

- |Enhancement| :class:`cluster.AgglomerativeClustering` has a faster and more
memory efficient implementation of single linkage clustering.
:pr:`11514` by :user:`Leland McInnes <lmcinnes>`.

- |Fix| :class:`cluster.KMeans` with ``algorithm="elkan"`` now converges with
``tol=0`` as with the default ``algorithm="full"``. :pr:`16075` by
:user:`Erich Schubert <kno10>`.

- |Fix| Fixed a bug in :class:`cluster.Birch` where the `n_clusters` parameter
could not have a `np.int64` type. :pr:`16484`
by :user:`Jeremie du Boisberranger <jeremiedbb>`.
Expand All @@ -81,47 +121,28 @@ Changelog
deprecated. It has no effect. :pr:`11950` by
:user:`Jeremie du Boisberranger <jeremiedbb>`.

- |Efficiency| The critical parts of :class:`cluster.KMeans` have a more
optimized implementation. Parallelism is now over the data instead of over
initializations allowing better scalability. :pr:`11950` by
:user:`Jeremie du Boisberranger <jeremiedbb>`.

- |Enhancement| :class:`cluster.KMeans` now supports sparse data when
`solver = "elkan"`. :pr:`11950` by
:user:`Jeremie du Boisberranger <jeremiedbb>`.

:mod:`sklearn.compose`
......................

- |Fix| :class:`compose.ColumnTransformer` method ``get_feature_names`` now
returns correct results when one of the transformer steps applies on an
empty list of columns :pr:`15963` by `Roman Yurchak`_.

- |Efficiency| :class:`compose.ColumnTransformer` is now faster when working
with dataframes and strings are used to specific subsets of data for
transformers. :pr:`16431` by `Thomas Fan`_.

- |Fix| :func:`compose.ColumnTransformer.fit` will error when selecting
a column name that is not unique in the dataframe. :pr:`16431` by
`Thomas Fan`_.

- |Enhancement| :class:`compose.ColumnTransformer` method ``get_feature_names``
now supports `'passthrough'` columns, with the feature name being either
the column name for a dataframe, or `'xi'` for column index `i`.
:pr:`14048` by :user:`Lewis Ball <lrjball>`.

:mod:`sklearn.datasets`
.......................
- |Fix| :class:`compose.ColumnTransformer` method ``get_feature_names`` now
returns correct results when one of the transformer steps applies on an
empty list of columns :pr:`15963` by `Roman Yurchak`_.

- |Enhancement| Added ``return_centers`` parameter in
:func:`datasets.make_blobs`, which can be used to return
centers for each cluster.
:pr:`15709` by :user:`<shivamgargsya>` and
:user:`Venkatachalam N <venkyyuvy>`.
- |Fix| :func:`compose.ColumnTransformer.fit` will error when selecting
a column name that is not unique in the dataframe. :pr:`16431` by
`Thomas Fan`_.

- |Enhancement| Functions :func:`datasets.make_circles` and
:func:`datasets.make_moons` now accept two-element tuple.
:pr:`15707` by :user:`Maciej J Mikulski <mjmikulski>`.
:mod:`sklearn.datasets`
.......................

- |Feature| :func:`datasets.fetch_california_housing` now supports
heterogeneous data using pandas by setting `as_frame=True`. :pr:`15950`
Expand All @@ -134,27 +155,40 @@ Changelog
``DataFrame`` by setting `as_frame=True`. :pr:`15980` by :user:`wconnell` and
:user:`Reshama Shaikh <reshamas>`.

- |Enhancement| Added ``return_centers`` parameter in
:func:`datasets.make_blobs`, which can be used to return
centers for each cluster.
:pr:`15709` by :user:`<shivamgargsya>` and
:user:`Venkatachalam N <venkyyuvy>`.

- |Enhancement| Functions :func:`datasets.make_circles` and
:func:`datasets.make_moons` now accept two-element tuple.
:pr:`15707` by :user:`Maciej J Mikulski <mjmikulski>`.

- |Fix| :func:`datasets.make_multilabel_classification` now generates
`ValueError` for arguments `n_classes < 1` OR `length < 1`.
:pr:`16006` by :user:`Rushabh Vasani <rushabh-v>`.

:mod:`sklearn.decomposition`
............................

- |Enhancement| :class:`decomposition.NMF` and
:func:`decomposition.non_negative_factorization` now preserves float32 dtype.
:pr:`16280` by :user:`Jeremie du Boisberranger <jeremiedbb>`.

- |Enhancement| :func:`TruncatedSVD.transform` is now faster on given sparse
``csc`` matrices. :pr:`16837` by :user:`wornbb`.

- |Fix| :class:`decomposition.PCA` with a float `n_components` parameter, will
exclusively choose the components that explain the variance greater than
`n_components`. :pr:`15669` by :user:`Krishna Chaitanya <krishnachaitanya9>`

- |Fix| :class:`decomposition.PCA` with `n_components='mle'` now correctly
handles small eigenvalues, and does not infer 0 as the correct number of
components. :pr: `4441` by :user:`Lisa Schwetlick <lschwetlick>`, and
components. :pr:`16224` by :user:`Lisa Schwetlick <lschwetlick>`, and
:user:`Gelavizh Ahmadi <gelavizh1>` and :user:`Marija Vlajic Wheeler
<marijavlajic>` and :pr:`16841` by `Nicolas Hug`_.

- |Enhancement| :class:`decomposition.NMF` and
:func:`decomposition.non_negative_factorization` now preserves float32 dtype.
:pr:`16280` by :user:`Jeremie du Boisberranger <jeremiedbb>`.

- |Fix| :class:`decomposition.KernelPCA` method ``inverse_transform`` now
applies the correct inverse transform to the transformed data. :pr:`16655`
by :user:`Lewis Ball <lrjball>`.
Expand All @@ -170,9 +204,22 @@ Changelog
:class:`ensemble.HistGradientBoostingRegressor` now support
:term:`sample_weight`. :pr:`14696` by `Adrin Jalali`_ and `Nicolas Hug`_.

- |Feature| Early stopping in
:class:`ensemble.HistGradientBoostingClassifier` and
:class:`ensemble.HistGradientBoostingRegressor` is now determined with a
new `early_stopping` parameter instead of `n_iter_no_change`. Default value
is 'auto', which enables early stopping if there are at least 10,000
samples in the training set. :pr:`14516` by :user:`Johann Faouzi
<johannfaouzi>`.

- |Feature| :class:`ensemble.HistGradientBoostingClassifier` and
:class:`ensemble.HistGradientBoostingRegressor` now support monotonic
constraints, useful when features are supposed to have a positive/negative
effect on the target. :pr:`15582` by `Nicolas Hug`_.

- |API| Added boolean `verbose` flag to classes:
:class:`ensemble.VotingClassifier` and :class:`ensemble.VotingRegressor`.
:pr:`15991` by :user:`Sam Bail <spbail>`,
:pr:`16069` by :user:`Sam Bail <spbail>`,
:user:`Hanna Bruce MacDonald <hannahbrucemacdonald>`,
:user:`Reshama Shaikh <reshamas>`, and
:user:`Chiara Marmo <cmarmo>`.
Expand All @@ -187,20 +234,7 @@ Changelog
:class:`ensemble.HistGradientBoostingRegressor`. The depth now corresponds to
the number of edges to go from the root to the deepest leaf.
Stumps (trees with one split) are now allowed.
:pr: `16182` by :user:`Santhosh B <santhoshbala18>`

- |Feature| Early stopping in
:class:`ensemble.HistGradientBoostingClassifier` and
:class:`ensemble.HistGradientBoostingRegressor` is now determined with a
new `early_stopping` parameter instead of `n_iter_no_change`. Default value
is 'auto', which enables early stopping if there are at least 10,000
samples in the training set. :pr:`14516` by :user:`Johann Faouzi
<johannfaouzi>`.

- |Feature| :class:`ensemble.HistGradientBoostingClassifier` and
:class:`ensemble.HistGradientBoostingRegressor` now support monotonic
constraints, useful when features are supposed to have a positive/negative
effect on the target. :pr:`15582` by `Nicolas Hug`_.
:pr:`16182` by :user:`Santhosh B <santhoshbala18>`

- |Fix| Fixed a bug in :class:`ensemble.BaggingClassifier`,
:class:`ensemble.BaggingRegressor` and :class:`ensemble.IsolationForest`
Expand Down Expand Up @@ -274,18 +308,23 @@ Changelog
:class:`linear_model:Lasso` for dense feature matrix `X`.
:pr:`15436` by :user:`Christian Lorentzen <lorentzenchr>`.

- |Fix| Fixed a bug where if a `sample_weight` parameter was passed to the fit
method of :class:`linear_model.RANSACRegressor`, it would not be passed to
the wrapped `base_estimator` during the fitting of the final model.
:pr:`15573` by :user:`Jeremy Alexandre <J-A16>`.

- |Efficiency| :class:`linear_model.RidgeCV` and
:class:`linear_model.RidgeClassifierCV` now does not allocate a
potentially large array to store dual coefficients for all hyperparameters
during its `fit`, nor an array to store all error or LOO predictions unless
`store_cv_values` is `True`.
:pr:`15652` by :user:`Jérôme Dockès <jeromedockes>`.

- |Enhancement| :class:`linear_model.LassoLars` and
:class:`linear_model.Lars` now support a `jitter` parameter that adds
random noise to the target. This might help with stability in some edge
cases. :pr:`15179` by :user:`angelaambroz`.

- |Fix| Fixed a bug where if a `sample_weight` parameter was passed to the fit
method of :class:`linear_model.RANSACRegressor`, it would not be passed to
the wrapped `base_estimator` during the fitting of the final model.
:pr:`15773` by :user:`Jeremy Alexandre <J-A16>`.

- |Fix| add `best_score_` attribute to :class:`linear_model.RidgeCV` and
:class:`linear_model.RidgeClassifierCV`.
:pr:`15653` by :user:`Jérôme Dockès <jeromedockes>`.
Expand All @@ -295,6 +334,11 @@ Changelog
instead of predictions.
:pr:`14848` by :user:`Venkatachalam N <venkyyuvy>`.

- |Fix| :class:`linear_model.LogisticRegression` will now avoid an unnecessary
iteration when `solver='newton-cg'` by checking for inferior or equal instead
of strictly inferior for maximum of `absgrad` and `tol` in `utils.optimize._newton_cg`.
:pr:`16266` by :user:`Rushabh Vasani <rushabh-v>`.

- |API| Deprecated public attributes `standard_coef_`, `standard_intercept_`,
`average_coef_`, and `average_intercept_` in
:class:`linear_model.SGDClassifier`,
Expand All @@ -303,31 +347,15 @@ Changelog
:class:`linear_model.PassiveAggressiveRegressor`.
:pr:`16261` by :user:`Carlos Brandt <chbrandt>`.

- |Fix| :class:`linear_model.LogisticRegression` will now avoid an unnecessary
iteration when `solver='newton-cg'` by checking for inferior or equal instead
of strictly inferior for maximum of `absgrad` and `tol` in `utils.optimize._newton_cg`.
:pr:`16266` by :user:`Rushabh Vasani <rushabh-v>`.

- |Fix| |Efficiency| :class:`linear_model.ARDRegression` is more stable and
much faster when `n_samples > n_features`. It can now scale to hundreds of
thousands of samples. The stability fix might imply changes in the number
of non-zero coefficients and in the predicted output. :pr:`16849` by
`Nicolas Hug`_.

- |Enhancement| :class:`linear_model.LassoLars` and
:class:`linear_model.Lars` now support a `jitter` parameter that adds
random noise to the target. This might help with stability in some edge
cases. :pr:`15179` by :user:`angelaambroz`.

:mod:`sklearn.metrics`
......................

- |API| Changed the formatting of values in
:meth:`metrics.ConfusionMatrixDisplay.plot` and
:func:`metrics.plot_confusion_matrix` to pick the shorter format (either '2g'
or 'd'). :pr:`16159` by :user:`Rick Mackenbach <Rick-Mackenbach>` and
`Thomas Fan`_.

- |Enhancement| :func:`metrics.pairwise.pairwise_distances_chunked` now allows
its ``reduce_func`` to not have a return value, enabling in-place operations.
:pr:`16397` by `Joel Nothman`_.
Expand All @@ -345,6 +373,12 @@ Changelog
the `labels` parameter.
:pr:`16442` by `Kyle Parsons <parsons-kyle-89>`.

- |API| Changed the formatting of values in
:meth:`metrics.ConfusionMatrixDisplay.plot` and
:func:`metrics.plot_confusion_matrix` to pick the shorter format (either '2g'
or 'd'). :pr:`16159` by :user:`Rick Mackenbach <Rick-Mackenbach>` and
`Thomas Fan`_.

:mod:`sklearn.model_selection`
..............................

Expand Down Expand Up @@ -394,14 +428,14 @@ Changelog
:mod:`sklearn.preprocessing`
............................

- |Efficiency| :class:`preprocessing.OneHotEncoder` is now faster at
transforming. :pr:`15762` by `Thomas Fan`_.

- |Feature| argument `drop` of :class:`preprocessing.OneHotEncoder`
will now accept value 'if_binary' and will drop the first category of
each feature with two categories. :pr:`16245`
by :user:`Rushabh Vasani <rushabh-v>`.

- |Efficiency| :class:`preprocessing.OneHotEncoder` is now faster at
transforming. :pr:`15762` by `Thomas Fan`_.

- |Fix| Fix a bug in :class:`preprocessing.StandardScaler` which was incorrectly
computing statistics when calling `partial_fit` on sparse inputs.
:pr:`16466` by :user:`Guillaume Lemaitre <glemaitre>`.
Expand Down Expand Up @@ -434,16 +468,16 @@ Changelog
number of samples (LibSVM) or the number of features (LibLinear) is large.
:pr:`13511` by :user:`Sylvain Marié <smarie>`.

- |API| :class:`svm.SVR` and :class:`svm.OneClassSVM` attributes, `probA_` and
`probB_`, are now deprecated as they were not useful. :pr:`15558` by
`Thomas Fan`_.

- |Fix| Fix use of custom kernel not taking float entries such as string
kernels in :class:`svm.SVC` and :class:`svm.SVR`. Note that custom kennels
are now expected to validate their input where they previously received
valid numeric arrays.
:pr:`11296` by `Alexandre Gramfort`_ and :user:`Georgi Peev <georgipeev>`.

- |API| :class:`svm.SVR` and :class:`svm.OneClassSVM` attributes, `probA_` and
`probB_`, are now deprecated as they were not useful. :pr:`15558` by
`Thomas Fan`_.

:mod:`sklearn.tree`
...................

Expand Down Expand Up @@ -483,14 +517,29 @@ Changelog
Miscellaneous
.............

- |Enhancement| ``scikit-learn`` now works with ``mypy`` without errors.
:pr:`16726` by `Roman Yurchak`_.

- |API| Most estimators now expose a `n_features_in_` attribute. This
attribute is equal to the number of features passed to the `fit` method.
See `SLEP010
<https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep010/proposal.html>`_
for details. :pr:`16112` and :pr:`16622` by `Nicolas Hug`_.
for details. :pr:`16112` by `Nicolas Hug`_.

- |API| Estimators now have a `requires_y` tags which is False by default
except for estimators that inherit from `~sklearn.base.RegressorMixin` or
`~sklearn.base.ClassifierMixin`. This tag is used to ensure that a proper
error message is raised when y was expected but None was passed.
:pr:`16622` by `Nicolas Hug`_.

- |API| Most constructor and function parameters are now expected to be passed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering whether we should have a top-level note for this one? Like we did in 0.22 for the deprecations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I was thinking about having a dedicated deprecation section too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanna do it? I'm happy to help if needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be really nice if you did this part.

as a keyword and not positional. :issue:`15005` by `Joel Nothman`_,
`Adrin Jalali`_, `Thomas Fan`_, and `Nicolas Hug`_. See `SLEP009
<https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep009/proposal.html>`_
for more details.

Code and Documentation Contributors
-----------------------------------

Thanks to everyone who has contributed to the maintenance and improvement of the
project since version 0.20, including:
Loading