Skip to content

ENH add newton-cholesky solver to LogisticRegression #24767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Nov 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 35 additions & 27 deletions doc/modules/linear_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -954,7 +954,7 @@ Solvers
-------

The solvers implemented in the class :class:`LogisticRegression`
are "liblinear", "newton-cg", "lbfgs", "sag" and "saga":
are "lbfgs", "liblinear", "newton-cg", "newton-cholesky", "sag" and "saga":

The solver "liblinear" uses a coordinate descent (CD) algorithm, and relies
on the excellent C++ `LIBLINEAR library
Expand All @@ -968,7 +968,7 @@ classifiers. For :math:`\ell_1` regularization :func:`sklearn.svm.l1_min_c` allo
calculate the lower bound for C in order to get a non "null" (all feature
weights to zero) model.

The "lbfgs", "sag" and "newton-cg" solvers only support :math:`\ell_2`
The "lbfgs", "newton-cg" and "sag" solvers only support :math:`\ell_2`
regularization or no regularization, and are found to converge faster for some
high-dimensional data. Setting `multi_class` to "multinomial" with these solvers
learns a true multinomial logistic regression model [5]_, which means that its
Expand All @@ -989,33 +989,41 @@ Broyden–Fletcher–Goldfarb–Shanno algorithm [8]_, which belongs to
quasi-Newton methods. The "lbfgs" solver is recommended for use for
small data-sets but for larger datasets its performance suffers. [9]_

The "newton-cholesky" solver is an exact Newton solver that calculates the hessian
matrix and solves the resulting linear system. It is a very good choice for
`n_samples` >> `n_features`, but has a few shortcomings: Only :math:`\ell_2`
regularization is supported. Furthermore, because the hessian matrix is explicitly
computed, the memory usage has a quadratic dependency on `n_features` as well as on
`n_classes`. As a consequence, only the one-vs-rest scheme is implemented for the
multiclass case.

The following table summarizes the penalties supported by each solver:

+------------------------------+-----------------+-------------+-----------------+-----------+------------+
| | **Solvers** |
+------------------------------+-----------------+-------------+-----------------+-----------+------------+
| **Penalties** | **'liblinear'** | **'lbfgs'** | **'newton-cg'** | **'sag'** | **'saga'** |
+------------------------------+-----------------+-------------+-----------------+-----------+------------+
| Multinomial + L2 penalty | no | yes | yes | yes | yes |
+------------------------------+-----------------+-------------+-----------------+-----------+------------+
| OVR + L2 penalty | yes | yes | yes | yes | yes |
+------------------------------+-----------------+-------------+-----------------+-----------+------------+
| Multinomial + L1 penalty | no | no | no | no | yes |
+------------------------------+-----------------+-------------+-----------------+-----------+------------+
| OVR + L1 penalty | yes | no | no | no | yes |
+------------------------------+-----------------+-------------+-----------------+-----------+------------+
| Elastic-Net | no | no | no | no | yes |
+------------------------------+-----------------+-------------+-----------------+-----------+------------+
| No penalty ('none') | no | yes | yes | yes | yes |
+------------------------------+-----------------+-------------+-----------------+-----------+------------+
| **Behaviors** | |
+------------------------------+-----------------+-------------+-----------------+-----------+------------+
| Penalize the intercept (bad) | yes | no | no | no | no |
+------------------------------+-----------------+-------------+-----------------+-----------+------------+
| Faster for large datasets | no | no | no | yes | yes |
+------------------------------+-----------------+-------------+-----------------+-----------+------------+
| Robust to unscaled datasets | yes | yes | yes | no | no |
+------------------------------+-----------------+-------------+-----------------+-----------+------------+
+------------------------------+-----------------+-------------+-----------------+-----------------------+-----------+------------+
| | **Solvers** |
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
| **Penalties** | **'lbfgs'** | **'liblinear'** | **'newton-cg'** | **'newton-cholesky'** | **'sag'** | **'saga'** |
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
| Multinomial + L2 penalty | yes | no | yes | no | yes | yes |
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
| OVR + L2 penalty | yes | yes | yes | yes | yes | yes |
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
| Multinomial + L1 penalty | no | no | no | no | no | yes |
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
| OVR + L1 penalty | no | yes | no | no | no | yes |
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
| Elastic-Net | no | no | no | no | no | yes |
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
| No penalty ('none') | yes | no | yes | yes | yes | yes |
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
| **Behaviors** | |
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
| Penalize the intercept (bad) | no | yes | no | no | no | no |
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
| Faster for large datasets | no | no | no | no | yes | yes |
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
| Robust to unscaled datasets | yes | yes | yes | yes | no | no |
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+

The "lbfgs" solver is used by default for its robustness. For large datasets
the "saga" solver is usually faster.
Expand Down
5 changes: 3 additions & 2 deletions doc/whats_new/v1.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -353,15 +353,16 @@ Changelog
:mod:`sklearn.linear_model`
...........................

- |Enhancement| :class:`linear_model.GammaRegressor`,
- |Enhancement| :class:`linear_model.LogisticRegression`,
:class:`linear_model.LogisticRegressionCV`, :class:`linear_model.GammaRegressor`,
:class:`linear_model.PoissonRegressor` and :class:`linear_model.TweedieRegressor` got
a new solver `solver="newton-cholesky"`. This is a 2nd order (Newton) optimisation
routine that uses a Cholesky decomposition of the hessian matrix.
When `n_samples >> n_features`, the `"newton-cholesky"` solver has been observed to
converge both faster and to a higher precision solution than the `"lbfgs"` solver on
problems with one-hot encoded categorical variables with some rare categorical
levels.
:pr:`24637` by :user:`Christian Lorentzen <lorentzenchr>`.
:pr:`24637` and :pr:`24767` by :user:`Christian Lorentzen <lorentzenchr>`.

- |Enhancement| :class:`linear_model.GammaRegressor`,
:class:`linear_model.PoissonRegressor` and :class:`linear_model.TweedieRegressor`
Expand Down
22 changes: 17 additions & 5 deletions sklearn/linear_model/_glm/glm.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,10 @@ class _GeneralizedLinearRegressor(RegressorMixin, BaseEstimator):
'newton-cholesky'
Uses Newton-Raphson steps (in arbitrary precision arithmetic equivalent to
iterated reweighted least squares) with an inner Cholesky based solver.
This solver is suited for n_samples >> n_features.
This solver is a good choice for `n_samples` >> `n_features`, especially
with one-hot encoded categorical features with rare categories. Be aware
that the memory usage of this solver has a quadratic dependency on
`n_features` because it explicitly computes the Hessian matrix.

.. versionadded:: 1.2

Expand Down Expand Up @@ -304,7 +307,7 @@ def fit(self, X, y, sample_weight=None):
coef = sol.solve(X, y, sample_weight)
self.n_iter_ = sol.iteration
else:
raise TypeError(f"Invalid solver={self.solver}.")
raise ValueError(f"Invalid solver={self.solver}.")

if self.fit_intercept:
self.intercept_ = coef[-1]
Expand Down Expand Up @@ -512,7 +515,10 @@ class PoissonRegressor(_GeneralizedLinearRegressor):
'newton-cholesky'
Uses Newton-Raphson steps (in arbitrary precision arithmetic equivalent to
iterated reweighted least squares) with an inner Cholesky based solver.
This solver is suited for n_samples >> n_features.
This solver is a good choice for `n_samples` >> `n_features`, especially
with one-hot encoded categorical features with rare categories. Be aware
that the memory usage of this solver has a quadratic dependency on
`n_features` because it explicitly computes the Hessian matrix.

.. versionadded:: 1.2

Expand Down Expand Up @@ -640,7 +646,10 @@ class GammaRegressor(_GeneralizedLinearRegressor):
'newton-cholesky'
Uses Newton-Raphson steps (in arbitrary precision arithmetic equivalent to
iterated reweighted least squares) with an inner Cholesky based solver.
This solver is suited for n_samples >> n_features.
This solver is a good choice for `n_samples` >> `n_features`, especially
with one-hot encoded categorical features with rare categories. Be aware
that the memory usage of this solver has a quadratic dependency on
`n_features` because it explicitly computes the Hessian matrix.

.. versionadded:: 1.2

Expand Down Expand Up @@ -799,7 +808,10 @@ class TweedieRegressor(_GeneralizedLinearRegressor):
'newton-cholesky'
Uses Newton-Raphson steps (in arbitrary precision arithmetic equivalent to
iterated reweighted least squares) with an inner Cholesky based solver.
This solver is suited for n_samples >> n_features.
This solver is a good choice for `n_samples` >> `n_features`, especially
with one-hot encoded categorical features with rare categories. Be aware
that the memory usage of this solver has a quadratic dependency on
`n_features` because it explicitly computes the Hessian matrix.

.. versionadded:: 1.2

Expand Down
Loading