Skip to content

Division by zero in OneVsRest's predict_proba (see #31224); bugfix for test_ovo_decision_function #31228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 7, 2025
Merged
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
- The `predict_proba` method of :class:`sklearn.multiclass.OneVsRestClassifier` now
returns zero for all classes when all inner estimators never predict their positive
class.
By :user:`Luis M. B. Varona <Luis-Varona>`, :user:`Marc Bresson <MarcBresson>`, and
:user:`Jérémie du Boisberranger <jeremiedbb>`.
6 changes: 4 additions & 2 deletions sklearn/multiclass.py
Original file line number Diff line number Diff line change
Expand Up @@ -553,8 +553,10 @@ def predict_proba(self, X):
Y = np.concatenate(((1 - Y), Y), axis=1)

if not self.multilabel_:
# Then, probabilities should be normalized to 1.
Y /= np.sum(Y, axis=1)[:, np.newaxis]
# Then, (nonzero) sample probability distributions should be normalized.
row_sums = np.sum(Y, axis=1)[:, np.newaxis]
np.divide(Y, row_sums, out=Y, where=row_sums != 0)

return Y

@available_if(_estimators_has("decision_function"))
Expand Down
26 changes: 26 additions & 0 deletions sklearn/tests/test_multiclass.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from numpy.testing import assert_allclose

from sklearn import datasets, svm
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.datasets import load_breast_cancer
from sklearn.exceptions import NotFittedError
from sklearn.impute import SimpleImputer
Expand Down Expand Up @@ -429,6 +430,31 @@ def test_ovr_single_label_predict_proba():
assert not (pred - Y_pred).any()


def test_ovr_single_label_predict_proba_zero():
"""Check that predic_proba returns all zeros when the base estimator
never predicts the positive class.
"""

class NaiveBinaryClassifier(BaseEstimator, ClassifierMixin):
def fit(self, X, y):
self.classes_ = np.unique(y)
return self

def predict_proba(self, X):
proba = np.ones((len(X), 2))
# Probability of being the positive class is always 0
proba[:, 1] = 0
return proba

base_clf = NaiveBinaryClassifier()
X, y = iris.data, iris.target # Three-class problem with 150 samples

clf = OneVsRestClassifier(base_clf).fit(X, y)
y_proba = clf.predict_proba(X)

assert_allclose(y_proba, 0.0)


def test_ovr_multilabel_decision_function():
X, Y = datasets.make_multilabel_classification(
n_samples=100,
Expand Down