DEP deprecate multi_class in LogisticRegression #28703

lorentzenchr · 2024-03-26T19:11:26Z

Reference Issues/PRs

Towards #11865.

What does this implement/fix? Explain your changes.

This PR deprecates the multi_class parameter in LogisticRegression. Using that option is equivalent to OneVsRestClassifier(LogisticRegression()), so no functionality is lost and, once gone, it would simplify the code of logreg quite a bit and make in more maintainable.

Any other comments?

This PR starts very simple with only LogisticRegression. In case of positive feedback, I'll extend it to LogisticRegressionCV and adapt all the docstrings and so on.

github-actions · 2024-03-26T19:12:49Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 10e73d8. Link to the linter CI: here}

lorentzenchr · 2024-03-27T22:47:57Z

@scikit-learn/core-devs friendly ping for visibility.

jjerphan · 2024-03-28T06:58:33Z

I have not had time to look at this PR at all, but before deprecating, can we have tests to be sure they are equivalent in results and potentially in UX (e.g. would accessing fitted attributes still be possible)?

agramfort · 2024-03-28T09:58:31Z

would be ok for me. I think it's ok to live with

OneVsRestClassifier(LogisticRegression(..))

when using liblinear.

GaelVaroquaux · 2024-03-28T11:05:25Z

Just to make sure that I understand things correctly: the plan is that by default multi-class is supported via multinomial loss but if the solver is liblinear multi-class raises an error, and the user must use a OvR?

If so, that strategy is fine by me, but I think that the deprecation message should first suggest to use solvers that support multinomial or user OvR if liblinear is desired.

lorentzenchr · 2024-03-28T21:39:01Z

Just to make sure that I understand things correctly: the plan is that by default multi-class is supported via multinomial loss but if the solver is liblinear multi-class raises an error, and the user must use a OvR?

Yes, exactly. And yes, with an informative deprecation warning, later an error advertising solvers that support the multinomial loss.

thomasjpfan

It also looks like solver="newton-cholesky" does not support ovr .

From a high level API point of view, LinearSVC only does OVR with liblinear and LogisticRegression will no longer support it. In the future, should LinearSVC also force users to use OneVsRestClassifier for multi-class problems?

Over the last few years, we are slowly making meta-estimators more necessary for certain task. (i.e., the removal of normalize or this PR). It kind of goes against the history of "lets make estimators easy to use". For example, the classifiers encodes string labels to "make these easy". This is my observation and I am undecided on the current path.

sklearn/linear_model/_logistic.py

GaelVaroquaux · 2024-03-28T22:01:36Z

Over the last few years, we are slowly making meta-estimators more necessary for certain task. (i.e., the removal of normalize or this PR). It kind of goes against the history of "lets make estimators easy to use". For example, the classifiers encodes string labels to "make these easy". This is my observation and I am undecided on the current path.

Yes, I worry a lot about this trend. Everybody that I talk to values a lot the fact that it's easy to get things going with scikit-learn. It's the number one benefit that people mention. If we loose this, we loose what made scikit-learn.

lorentzenchr · 2024-03-29T07:53:03Z

Over the last few years, we are slowly making meta-estimators more necessary for certain task.

You raise good points. For this particular case, I have 3 answers:

Out of the box, logreg will continue to just work. Only if a user wants to use a specific solver, he/she will be directed to the meta-estimator. (We can even extent the newton-cholesky one to support multinomial, no big deal. Then, only liblinear would be left.)
From a statistical point of view, multinomial logreg should clearly be favored over OvR logreg. (In fact, I find it even hard to provide good statistical reference for OvR logreg.)
The logic of LogisticRegression will become significantly less „opaque“ and the code more maintainable. (And yes, this is a concern as the code complexity has been a problem in several PRs, making contributing hard.)

…ass)

lorentzenchr · 2024-04-09T16:43:38Z

I interpret the conversation so far as decision to deprecate multi_class in LogisticRegression.

I have an open questions: Shall we raise an error for multiclass liblinear or internally switch to OvR? (And state this in the docs)?

ogrisel · 2024-04-10T12:19:11Z

I am fine for deprecating the multi_class thingy in LogisticRegression but not in LinearSVC because there is no good default alternative to handle the multiclass case for the latter so that would be a regression from a UX point of view.

jeremiedbb · 2024-05-02T09:58:54Z

We just have a different factor of 2 somewhere which interferes with the penalty parameters.

Yes I noticed that when setting C=np.inf the coefs differ by a factor 2 exactly.

lorentzenchr · 2024-05-02T10:06:16Z

Anything else I can do for this PR?

jeremiedbb · 2024-05-02T10:07:00Z

Thanks for the clarification. There's still a difference I think because in binary + multi_class="multinomial", a softmax is used to compute the probabilities. And it looks that it's used by some users (see #9889). We even have special comments in the docs regarding binary + multi_class="multinomial", e.g.

scikit-learn/sklearn/linear_model/_logistic.py

Lines 1003 to 1014 in 2bafd7b

    
                   `coef_` is of shape (1, n_features) when the given problem is binary. 
        
                   In particular, when `multi_class='multinomial'`, `coef_` corresponds 
        
                   to outcome 1 (True) and `-coef_` corresponds to outcome 0 (False). 
        
               intercept_ : ndarray of shape (1,) or (n_classes,) 
        
                   Intercept (a.k.a. bias) added to the decision function. 
        
                   If `fit_intercept` is set to False, the intercept is set to zero. 
        
                   `intercept_` is of shape (1,) when the given problem is binary. 
        
                   In particular, when `multi_class='multinomial'`, `intercept_` 
        
                   corresponds to outcome 1 (True) and `-intercept_` corresponds to 
        
                   outcome 0 (False).

, and dedicated tests like

scikit-learn/sklearn/linear_model/tests/test_logistic.py

Lines 271 to 289 in 2bafd7b

    
           def test_multinomial_binary_probabilities(global_random_seed): 
        
               # Test multinomial LR gives expected probabilities based on the 
        
               # decision function, for a binary problem. 
        
               X, y = make_classification(random_state=global_random_seed) 
        
               clf = LogisticRegression( 
        
                   multi_class="multinomial", 
        
                   solver="saga", 
        
                   tol=1e-3, 
        
                   random_state=global_random_seed, 
        
               ) 
        
               clf.fit(X, y) 
        
               decision = clf.decision_function(X) 
        
               proba = clf.predict_proba(X) 
        
               expected_proba_class_1 = np.exp(decision) / (np.exp(decision) + np.exp(-decision)) 
        
               expected_proba = np.c_[1 - expected_proba_class_1, expected_proba_class_1] 
        
               assert_almost_equal(proba, expected_proba)

.

I don't have an opinion for the decision of removing the binary + multi_class="multinomial" use case, I trust you on that, but I think that we need and additional future warning for this special case because the current warning says that from 1.7 it will always be multinomial, which is not true for binary problems and we don't propose any alternative.

lorentzenchr · 2024-05-02T10:12:33Z

I don't have an opinion for the decision of removing the binary + multi_class="multinomial" use case, I trust you on that, but I think that we need and additional future warning for this special case because the current warning says that from 1.7 it will always be multinomial, which is not true for binary problems and we don't propose any alternative.

🤔 We don't have good terms available: multi_class : {'auto', 'ovr', 'multinomial'}. There is no "binary" and to misuse OVR as binary would be awkward (the logic of the code does it, which is very awkward). Therefore, I don't know how to improve the warning. I would just neglect that case.

And we can always improve warning messages later, in case someone comes up with a good idea how to improve.

jeremiedbb · 2024-05-02T10:14:44Z

I would trigger a dedicated warning if type_of_target(y) == "binary" and multi_class == "multinomial"

GaelVaroquaux · 2024-05-02T10:16:08Z

I would just neglect that case.

I'm a bit worried about this stance, unless we have strong evidence that it is not used. The risk is to break people's pipelines, and they may have good reasons for such a pipeline.

lorentzenchr · 2024-05-02T10:29:45Z

I would trigger a dedicated warning if type_of_target(y) == "binary" and multi_class == "multinomial"

I'll add one.

The risk is to break people's pipelines

A scikit-learn pipeline won't break. ~~Only if you rely on the shapes of coef_ in some analysis after the fit or in a custom object using LogistigRegression inside.~~ (EDIT: Even the shapes are the same.) I'll add that warning.

lorentzenchr · 2024-05-02T10:57:47Z

Ready again if CI passes.

jeremiedbb

LGTM. ping @ogrisel, does your +1 still holds, especially w.r.t the binary situation ?

ogrisel · 2024-05-02T16:23:57Z

Yes, binomial and multinomial are mathematical equivalent for 2 classes - in both cases, the loss is the log loss. We just have a different factor of 2 somewhere which interferes with the penalty parameters.

When the problem is penalized, the (projected) over-parametrized solution (multinomial parametrization with softmax link) and the standard binary solution (binomial parametrization with logit link) can be very different, both in coef space and in function space, and even when setting the tolerance to a very small value:

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

X, y = make_classification(
    n_classes=2,
    n_samples=1000,
    n_features=5,
    n_informative=2,
    n_redundant=2,
    random_state=0,
)


hparams = dict(
    C=1e-2,
    tol=1e-16,
    solver="lbfgs",
)
logreg_logit = LogisticRegression(**hparams).fit(X, y)
logreg_softmax = LogisticRegression(multi_class="multinomial", **hparams).fit(X, y)

logreg_logit.coef_ / logreg_softmax.coef_

array([[ 1.59470659,  1.54629272,  1.60829714, -0.25944454,  1.36905979]])

import numpy as np

X_test = np.random.RandomState(0).randn(10, 5)
np.abs(
    logreg_logit.predict_proba(X_test)[:, 1] - logreg_softmax.predict_proba(X_test)[:, 1]
).max()

0.056755531461336606

Note that the above results are mostly the same with other solvers ("newton-cg", "sag", and "saga", as expected given the strongly convex objective functions).

I confirm that when increasing or decreasing penalization, the two models become equivalent.

The discrepancy between the two solutions also goes away asymptotically with larger sample sizes.

I would not have expected such a strong discrepancy for intermediate penalties and sample sizes though. But indeed the penalties are not equivalent for both formulations.

That being said, I agree that setting multi_class="multinomial" vs multi_class="ovr" to change between over-parametrized softmax formulation vs the standard binary logit formulation is really counter intuitive / misleading.

If we want to keep the possibility to fit (over/softmax) parametrized binary logistic regression in the future, we should probably do so with a better name, but I am not sure which.

lorentzenchr · 2024-05-02T17:06:16Z

If we want to keep the possibility to fit (over/softmax) parametrized binary logistic regression in the future

We don't. At least I don't want that. I consider it bad practice. There are only disadvantages. I see absolutely no reason beyond curiosity to do so.
Furthermore, if you select the penalty via CV, you will end up with models of same predictions up to some uncertainty (solver, numeris, ..).

🙏 And I really beg you to not over-complicated this deprecation. 🙏

GaelVaroquaux · 2024-05-02T20:49:23Z

> We don't. At least I don't want that. I consider it bad practice.

This should not be an argument. That someone considering something bad practice should modify a core library used by millions to make this bad practice impossible is not good behavior unless there is a documented wide consensus that the practice is very problematic.

lorentzenchr · 2024-05-03T07:45:30Z

Here is my last drop of rationality:

If we are concerned about the effect on millions of users, then let's do this deprecation. This way, the functionality is still there, just a warning about future removal is emitted. If users really care that much and have good reasons for using an over-parametrized multinomial on a binomial problem, then they will complain about that. The deprecation is a way to get user feedback (it's a communication device anyway).

For info: Before #23040, HistGradientBoostingClassifier with loss = "categorical_crossentropy" raised a ValueError on binary problems. Now the only option is loss="log_loss" which automatically selects the proper log-loss setting with a single tree instead of two for binary problems - much like proposed in this PR. We never got any complaints.

ogrisel · 2024-05-03T08:48:50Z

I agree with @lorentzenchr's rationale above.

Furthermore, I believe that using LogisticRegression(multi_class="multinomial") is actually broken (not doing what the docstring/comments says it does and what user would expect):

scikit-learn/sklearn/linear_model/_logistic.py

Lines 570 to 580 in 4e20b01

if multi_class == "multinomial":

n_classes = max(2, classes.size)

if solver in ["lbfgs", "newton-cg"]:

multi_w0 = np.reshape(w0, (n_classes, -1), order="F")

else:

multi_w0 = w0

if n_classes == 2:

multi_w0 = multi_w0[1][np.newaxis, :]

coefs.append(multi_w0.copy())

else:

coefs.append(w0.copy())

The line 577 should actually be:

multi_w0 = 2 * multi_w0[1][np.newaxis, :]

we are missing a factor 2 to get the correct mathematical equivalence, hence the computed predictions that use the logistic sigmoid instead of the redundant softmax when n_classes=2 are not the ones we should have obtained at convergence: they are twice too weak (uncertain) in terms of log odds ratios.

EDIT: Actually, I checked the code in predict_proba and it's consistent with the original line multi_w0 = multi_w0[1][np.newaxis, :] line, so there is no bug: it's "correctly" using sotfmax with duplicated coefs and swapped signs when n_classes == 2 and multi_class == "multinomial". This is overkill and hard to understand what the coef_ values mean in this case but the estimated class membership probabilities are correct.

jeremiedbb · 2024-05-03T08:50:15Z

I think it's still better to keep this deprecation because chosing between the 2 through multi_class="ovr"/"multinomial" is a bad design. If we want to keep the 2 behaviors, we'd rather introduce a new hyper param. We can do that between the RC and the final, or wait to see if we get negative feedback from the future warning.

ogrisel

+1 to use the deprecation period to see if any user cares about the over-parametrized formulation.

If so, we will have time to add a better constructor parameter to expose that feature.

ogrisel · 2024-05-03T09:55:17Z

@GaelVaroquaux are you -1 on using the deprecation period to discover if there are users that actually rely on the overparameterized formulation?

I am also -0 on introducing an alternative option to reintroduce this formulation with a cleaner API if we don't have a strong clue that it is actually useful in any way as it would risk making the code complex for nothing.

GaelVaroquaux · 2024-05-03T11:25:15Z

I have no strong opinion. I'm neither +1 nor -1 on the choice, and I don't feel that I have evidence to have an informed opinion. I have an issue with the attitude that I stressed above. It's not the right way to make decisions.

…

On May 3, 2024, 11:55, at 11:55, Olivier Grisel ***@***.***> wrote: @GaelVaroquaux are you -1 on using the deprecation period to discover if there are users that actually rely on the overparameterized formulation? I am also -0 on introducing an alternative option to reintroduce this formulation with a cleaner API if we don't have a strong clue that it is actually useful in any way as it would risk making the code complex for nothing. -- Reply to this email directly or view it on GitHub: #28703 (comment) You are receiving this because you were mentioned. Message ID: ***@***.***>

ogrisel · 2024-05-03T12:21:25Z

Alright, let's merge this as is but be mindful to take users feedback on this issue with consideration during the deprecation period, in case they do find a valid reason to use the softmax parametrization.

ogrisel · 2024-05-03T12:49:27Z

Furthermore, if you select the penalty via CV, you will end up with models of same predictions up to some uncertainty (solver, numeris, ..).

For the record, I checked empirically that empirically by checking that using a C_logit = 2 * C_softmax penalty leads to the same solutions on the regularization path (up to small solver-dependent numerical discrepancies):

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegressionCV
import numpy as np
from time import perf_counter


X, y = make_classification(
    n_classes=2,
    n_samples=1000,
    n_features=5,
    n_informative=2,
    n_redundant=2,
    random_state=0,
)


Cs_logit = np.logspace(-6, 6, 27)
hparams = dict(
    tol=1e-10,
    solver="lbfgs",
    scoring="neg_log_loss",
    max_iter=10_000,
)
tic = perf_counter()
logreg_logit = LogisticRegressionCV(Cs=Cs_logit, **hparams).fit(X, y)
print(
    f"Fit logit model in {perf_counter() - tic:.4f} seconds, optimal C={logreg_logit.C_}"
)
tic = perf_counter()
logreg_softmax = LogisticRegressionCV(
    Cs=Cs_logit / 2, multi_class="multinomial", **hparams
).fit(X, y)
print(
    f"Fit softmax model in {perf_counter() - tic:.4f} seconds, optimal C={logreg_softmax.C_}"
)
np.testing.assert_allclose(logreg_logit.coef_, 2 * logreg_softmax.coef_)

class_idx = 1
np.testing.assert_allclose(
    logreg_logit.scores_[class_idx], logreg_softmax.scores_[class_idx]
)
max_abs_coef_path_diff = np.abs(
    logreg_logit.coefs_paths_[class_idx] -  2 * logreg_softmax.coefs_paths_[class_idx]
).max()
print(f"Max absolute difference in coef paths: {max_abs_coef_path_diff:.2e}")

Here is the typical output for that script with the lbfgs solver.

Fit logit model in 0.1500 seconds, optimal C=[1.]
Fit softmax model in 0.1527 seconds, optimal C=[0.5]
Max absolute difference in coef paths: 4.69e-07

I get similar results with Newton-CG/SAG/SAGA. (I had to increase tol a bit otherwise some solvers would have various kinds of convergence problems during that grid search but I think this is kind of expected).

So in the end, this confirms Christian's statement and I cannot imagine any reason to justify exposing the over-parametrized case in our public API in the future.

DEP deprecate multi_class in LogisticRegression

103d9ef

lorentzenchr added this to the 1.5 milestone Mar 26, 2024

github-actions bot added the module:linear_model label Mar 26, 2024

lorentzenchr added the Needs Decision Requires decision label Mar 26, 2024

thomasjpfan reviewed Mar 28, 2024

View reviewed changes

sklearn/linear_model/_logistic.py Show resolved Hide resolved

lorentzenchr added 11 commits April 7, 2024 21:30

DOC improve deprecation messages

c0901a3

DOC improve solver support matrix

2e3605f

FIX do not change self.multi_class in fit

fced0b2

DEP multi_class in LogisticRegressionCV

df4091e

DOC update linear model user guide

c192198

FIX typos

a80142b

DOC / EXA adapt for deprecated ovr

1b67f3e

EXA fix coef in example

b4a6629

FIX develop.rst

398fd01

TST fix linear model tests

100ae1d

DOC / EXA fix examples and docstring with LogisticRegression(multi_cl…

1072da0

…ass)

EXA fix plot_logistic_multinomial

fc8f319

lorentzenchr added 3 commits April 10, 2024 22:32

TST catch all FutureWarnings in test_logistic.py

94a7ec5

Merge branch 'main' into deprecate_ovr_logreg

9f03542

TST make all tests pass with -Werror::FutureWarning

81161e5

CLN address review comments

535fa9f

lorentzenchr added 2 commits May 2, 2024 12:40

ENH add additional warning for binary multinomial

179741e

CLN filterwarnings in test_multinomial_binary

29d2e79

jeremiedbb approved these changes May 2, 2024

View reviewed changes

Merge branch 'main' into deprecate_ovr_logreg

10e73d8

ogrisel approved these changes May 3, 2024

View reviewed changes

jeremiedbb merged commit 20c8e69 into scikit-learn:main May 3, 2024

ogrisel deleted the deprecate_ovr_logreg branch May 3, 2024 12:49

lorentzenchr mentioned this pull request May 10, 2024

RFC LogisticRegression code is way to opaque #11865

Open

elisno mentioned this pull request May 24, 2024

Remove deprecated multi_class argument in LogisticRegression and skip test for sklearn 1.5.0 regression cleanlab/cleanlab#1124

Merged

3 tasks

lucyleeow mentioned this pull request Oct 3, 2024

DOC Remove mention of deprecated multi_class in LogisticRegression #29998

Merged

Valentin-Laurent mentioned this pull request Feb 20, 2025

Handle multi_class="multinomial" sklearn deprecation scikit-learn-contrib/MAPIE#615

Closed

Uh oh!

DEP deprecate multi_class in LogisticRegression #28703

DEP deprecate multi_class in LogisticRegression #28703

Uh oh!

Conversation

lorentzenchr commented Mar 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Mar 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

lorentzenchr commented Mar 27, 2024

Uh oh!

jjerphan commented Mar 28, 2024

Uh oh!

agramfort commented Mar 28, 2024

Uh oh!

GaelVaroquaux commented Mar 28, 2024

Uh oh!

lorentzenchr commented Mar 28, 2024

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

GaelVaroquaux commented Mar 28, 2024 via email

Uh oh!

lorentzenchr commented Mar 29, 2024

Uh oh!

lorentzenchr commented Apr 9, 2024

Uh oh!

ogrisel commented Apr 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremiedbb commented May 2, 2024

Uh oh!

lorentzenchr commented May 2, 2024

Uh oh!

jeremiedbb commented May 2, 2024

Uh oh!

lorentzenchr commented May 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremiedbb commented May 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GaelVaroquaux commented May 2, 2024 via email

Uh oh!

lorentzenchr commented May 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commented May 2, 2024

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented May 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commented May 2, 2024 • edited by ogrisel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GaelVaroquaux commented May 2, 2024 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commented May 3, 2024

Uh oh!

ogrisel commented May 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremiedbb commented May 3, 2024

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented May 3, 2024

lorentzenchr commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

ogrisel commented Apr 10, 2024 •

edited

Loading

lorentzenchr commented May 2, 2024 •

edited

Loading

jeremiedbb commented May 2, 2024 •

edited

Loading

lorentzenchr commented May 2, 2024 •

edited

Loading

ogrisel commented May 2, 2024 •

edited

Loading

lorentzenchr commented May 2, 2024 •

edited by ogrisel

Loading

GaelVaroquaux commented May 2, 2024 via email •

edited

Loading

ogrisel commented May 3, 2024 •

edited

Loading

ogrisel commented May 3, 2024 •

edited

Loading