scikit-learn · glemaitre · Nov 3, 2022 · Oct 27, 2022 · Oct 27, 2022 · Oct 27, 2022
diff --git a/doc/modules/linear_model.rst b/doc/modules/linear_model.rst
@@ -954,7 +954,7 @@ Solvers
 -------
 
 The solvers implemented in the class :class:`LogisticRegression`
-are "liblinear", "newton-cg", "lbfgs", "sag" and "saga":
+are "lbfgs", "liblinear", "newton-cg", "newton-cholesky", "sag" and "saga":
 
 The solver "liblinear" uses a coordinate descent (CD) algorithm, and relies
 on the excellent C++ `LIBLINEAR library
@@ -968,7 +968,7 @@ classifiers. For :math:`\ell_1` regularization :func:`sklearn.svm.l1_min_c` allo
 calculate the lower bound for C in order to get a non "null" (all feature
 weights to zero) model.
 
-The "lbfgs", "sag" and "newton-cg" solvers only support :math:`\ell_2`
+The "lbfgs", "newton-cg" and "sag" solvers only support :math:`\ell_2`
 regularization or no regularization, and are found to converge faster for some
 high-dimensional data. Setting `multi_class` to "multinomial" with these solvers
 learns a true multinomial logistic regression model [5]_, which means that its
@@ -989,33 +989,41 @@ Broyden–Fletcher–Goldfarb–Shanno algorithm [8]_, which belongs to
 quasi-Newton methods. The "lbfgs" solver is recommended for use for
 small data-sets but for larger datasets its performance suffers. [9]_
 
+The "newton-cholesky" solver is an exact Newton solver that calculates the hessian
+matrix and solves the resulting linear system. It is a very good choice for
+`n_samples` >> `n_features`, but has a few shortcomings: Only :math:`\ell_2`
+regularization is supported. Furthermore, because the hessian matrix is explicitly
+computed, the memory usage has a quadratic dependency on `n_features` as well as on
+`n_classes`. As a consequence, only the one-vs-rest scheme is implemented for the
+multiclass case.
+
 The following table summarizes the penalties supported by each solver:
 
-+------------------------------+-----------------+-------------+-----------------+-----------+------------+
-|                              |                       **Solvers**                                        |
-+------------------------------+-----------------+-------------+-----------------+-----------+------------+
-| **Penalties**                | **'liblinear'** | **'lbfgs'** | **'newton-cg'** | **'sag'** | **'saga'** |
-+------------------------------+-----------------+-------------+-----------------+-----------+------------+
-| Multinomial + L2 penalty     |       no        |     yes     |       yes       |    yes    |    yes     |
-+------------------------------+-----------------+-------------+-----------------+-----------+------------+
-| OVR + L2 penalty             |       yes       |     yes     |       yes       |    yes    |    yes     |
-+------------------------------+-----------------+-------------+-----------------+-----------+------------+
-| Multinomial + L1 penalty     |       no        |     no      |       no        |    no     |    yes     |
-+------------------------------+-----------------+-------------+-----------------+-----------+------------+
-| OVR + L1 penalty             |       yes       |     no      |       no        |    no     |    yes     |
-+------------------------------+-----------------+-------------+-----------------+-----------+------------+
-| Elastic-Net                  |       no        |     no      |       no        |    no     |    yes     |
-+------------------------------+-----------------+-------------+-----------------+-----------+------------+
-| No penalty ('none')          |       no        |     yes     |       yes       |    yes    |    yes     |
-+------------------------------+-----------------+-------------+-----------------+-----------+------------+
-| **Behaviors**                |                                                                          |
-+------------------------------+-----------------+-------------+-----------------+-----------+------------+
-| Penalize the intercept (bad) |       yes       |     no      |       no        |    no     |    no      |
-+------------------------------+-----------------+-------------+-----------------+-----------+------------+
-| Faster for large datasets    |       no        |     no      |       no        |    yes    |    yes     |
-+------------------------------+-----------------+-------------+-----------------+-----------+------------+
-| Robust to unscaled datasets  |       yes       |     yes     |       yes       |    no     |    no      |
-+------------------------------+-----------------+-------------+-----------------+-----------+------------+
++------------------------------+-----------------+-------------+-----------------+-----------------------+-----------+------------+
+|                              |                       **Solvers**                                                                |
++------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
+| **Penalties**                | **'lbfgs'** | **'liblinear'** | **'newton-cg'** | **'newton-cholesky'** | **'sag'** | **'saga'** |
++------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
+| Multinomial + L2 penalty     |     yes     |       no        |       yes       |     no                |    yes    |    yes     |
++------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
+| OVR + L2 penalty             |     yes     |       yes       |       yes       |     yes               |    yes    |    yes     |
++------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
+| Multinomial + L1 penalty     |     no      |       no        |       no        |     no                |    no     |    yes     |
++------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
+| OVR + L1 penalty             |     no      |       yes       |       no        |     no                |    no     |    yes     |
++------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
+| Elastic-Net                  |     no      |       no        |       no        |     no                |    no     |    yes     |
++------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
+| No penalty ('none')          |     yes     |       no        |       yes       |     yes               |    yes    |    yes     |
++------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
+| **Behaviors**                |                                                                                                  |
++------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
+| Penalize the intercept (bad) |     no      |       yes       |       no        |     no                |    no     |    no      |
++------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
+| Faster for large datasets    |     no      |       no        |       no        |     no                |    yes    |    yes     |
++------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
+| Robust to unscaled datasets  |     yes     |       yes       |       yes       |     yes               |    no     |    no      |
++------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
 
 The "lbfgs" solver is used by default for its robustness. For large datasets
 the "saga" solver is usually faster.

diff --git a/doc/whats_new/v1.2.rst b/doc/whats_new/v1.2.rst
@@ -353,15 +353,16 @@ Changelog
 :mod:`sklearn.linear_model`
 ...........................
 
-- |Enhancement| :class:`linear_model.GammaRegressor`,
+- |Enhancement| :class:`linear_model.LogisticRegression`,
+  :class:`linear_model.LogisticRegressionCV`, :class:`linear_model.GammaRegressor`,
   :class:`linear_model.PoissonRegressor` and :class:`linear_model.TweedieRegressor` got
   a new solver `solver="newton-cholesky"`. This is a 2nd order (Newton) optimisation
   routine that uses a Cholesky decomposition of the hessian matrix.
   When `n_samples >> n_features`, the `"newton-cholesky"` solver has been observed to
   converge both faster and to a higher precision solution than the `"lbfgs"` solver on
   problems with one-hot encoded categorical variables with some rare categorical
   levels.
-  :pr:`24637` by :user:`Christian Lorentzen <lorentzenchr>`.
+  :pr:`24637` and :pr:`24767` by :user:`Christian Lorentzen <lorentzenchr>`.
 
 - |Enhancement| :class:`linear_model.GammaRegressor`,
   :class:`linear_model.PoissonRegressor` and :class:`linear_model.TweedieRegressor`

diff --git a/sklearn/linear_model/_glm/glm.py b/sklearn/linear_model/_glm/glm.py
@@ -75,7 +75,10 @@ class _GeneralizedLinearRegressor(RegressorMixin, BaseEstimator):
         'newton-cholesky'
             Uses Newton-Raphson steps (in arbitrary precision arithmetic equivalent to
             iterated reweighted least squares) with an inner Cholesky based solver.
-            This solver is suited for n_samples >> n_features.
+            This solver is a good choice for `n_samples` >> `n_features`, especially
+            with one-hot encoded categorical features with rare categories. Be aware
+            that the memory usage of this solver has a quadratic dependency on
+            `n_features` because it explicitly computes the Hessian matrix.
 
             .. versionadded:: 1.2
 
@@ -304,7 +307,7 @@ def fit(self, X, y, sample_weight=None):
             coef = sol.solve(X, y, sample_weight)
             self.n_iter_ = sol.iteration
         else:
-            raise TypeError(f"Invalid solver={self.solver}.")
+            raise ValueError(f"Invalid solver={self.solver}.")
 
         if self.fit_intercept:
             self.intercept_ = coef[-1]
@@ -512,7 +515,10 @@ class PoissonRegressor(_GeneralizedLinearRegressor):
         'newton-cholesky'
             Uses Newton-Raphson steps (in arbitrary precision arithmetic equivalent to
             iterated reweighted least squares) with an inner Cholesky based solver.
-            This solver is suited for n_samples >> n_features.
+            This solver is a good choice for `n_samples` >> `n_features`, especially
+            with one-hot encoded categorical features with rare categories. Be aware
+            that the memory usage of this solver has a quadratic dependency on
+            `n_features` because it explicitly computes the Hessian matrix.
 
             .. versionadded:: 1.2
 
@@ -640,7 +646,10 @@ class GammaRegressor(_GeneralizedLinearRegressor):
         'newton-cholesky'
             Uses Newton-Raphson steps (in arbitrary precision arithmetic equivalent to
             iterated reweighted least squares) with an inner Cholesky based solver.
-            This solver is suited for n_samples >> n_features.
+            This solver is a good choice for `n_samples` >> `n_features`, especially
+            with one-hot encoded categorical features with rare categories. Be aware
+            that the memory usage of this solver has a quadratic dependency on
+            `n_features` because it explicitly computes the Hessian matrix.
 
             .. versionadded:: 1.2
 
@@ -799,7 +808,10 @@ class TweedieRegressor(_GeneralizedLinearRegressor):
         'newton-cholesky'
             Uses Newton-Raphson steps (in arbitrary precision arithmetic equivalent to
             iterated reweighted least squares) with an inner Cholesky based solver.
-            This solver is suited for n_samples >> n_features.
+            This solver is a good choice for `n_samples` >> `n_features`, especially
+            with one-hot encoded categorical features with rare categories. Be aware
+            that the memory usage of this solver has a quadratic dependency on
+            `n_features` because it explicitly computes the Hessian matrix.
 
             .. versionadded:: 1.2