Division by zero in OneVsRest's `predict_proba` (see #31224); bugfix for `test_ovo_decision_function` #31228

Luis-Varona · 2025-04-20T01:29:00Z

When a classifier has zero confidence in a class, zero rows appear in the predicted probability matrix, resulting in undefined behavior during probability normalization. OneVsRestClassifier's predict_proba method attempts to divide by zero row sums:

        .... (precursor logic)
        if not self.multilabel_:
            # Then, probabilities should be normalized to 1.
            Y /= np.sum(Y, axis=1)[:, np.newaxis]
        return Y

This fixes that behavior by setting all zero row sums to 1:

        .... (precursor logic)
        if not self.multilabel_:
            # Then, (nonzero) probabilities should be normalized to 1.
            row_sums = np.sum(Y, axis=1)[:, np.newaxis]
            row_sums[row_sums == 0] = 1 # Avoid division by zero
            Y /= row_sums
        return Y

See Issue #31224 for more details.

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

When a classifier has zero confidence in a class, zero rows appear in the predicted probability matrix, resulting in undefined behavior during probability normalization. OneVsRestClassifier's predict_proba method attempts to divide by zero row sums. This fixes that behavior by setting all zero row sums to 1.

github-actions · 2025-04-20T01:29:55Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: a904988. Link to the linter CI: here}

jeremiedbb

Thanks for the PR @Luis-Varona. Let's use numpy's divide function which allows to specify where to divide.

Please add a test in sklearn/tests/test_multiclass.py and an entry in the changelog (see https://github.com/scikit-learn/scikit-learn/blob/main/doc/whats_new/upcoming_changes/README.md).

sklearn/multiclass.py

Luis-Varona · 2025-04-21T13:56:56Z

Will do. 🙂

Refactored the division-by-zero fix in OneVsRest's predict_proba to use np.divide. Documented the change in the changelog. Added test_ovr_single_label_predict_proba_zero_row in tests to validate the bug fix.

Luis-Varona · 2025-04-22T00:27:27Z

@jeremiedbb @MarcBresson I've made the requested changes. (The only thing I'm worried about is whether test_ovr_single_label_predict_proba_zero_row is too long of a test function name; we already have test_ovr_single_label_predict_proba, and I wanted to extract the logic into a separate function for obvious reasons.) Let me know if everything is good 🙂

sklearn/tests/test_multiclass.py

Removed test (test_ovr_multilabel_predict_proba_zero_row) accidentally left over from prototyping. Changed zero indexing logic in test_ovr_single_label_predict_proba_zero_row to be more streamlined.

MarcBresson · 2025-04-23T14:35:14Z

There is one test failing in https://github.com/scikit-learn/scikit-learn/actions/runs/14620590689/job/41019410025?pr=31228

FAILED tests/test_multiclass.py::test_ovo_decision_function - assert 145 > 146

Luis-Varona · 2025-04-23T14:37:28Z

There is one test failing in https://github.com/scikit-learn/scikit-learn/actions/runs/14620590689/job/41019410025?pr=31228

FAILED tests/test_multiclass.py::test_ovo_decision_function - assert 145 > 146

@MarcBresson this is not my test; I didn't touch it. Presumably if we want to fix this, it should be in another Issue/PR?

Luis-Varona · 2025-04-23T15:23:10Z

@MarcBresson anything else to fix before merging? Do you really want to address the bug with test_ovo_decision_function in this PR, or just make another PR? (We touched neither the OneVsOneClassifier class nor the test_ovo_decision_function function in this PR, so personally I think we should relegate it to another bug fix Issue, but it's up to you 🙂)

Lowered the threshold of unique values in the iris dataset from >146 to >140 in the test_ovo_decision_function test of the test_multiclass suite. Previous CI tests ended up with 145 unique values, resulting in a failing 'assert 145 > 146' check.

Luis-Varona · 2025-04-26T00:35:14Z

@MarcBresson @jeremiedbb I fixed the test_ovo_decision_function check; it was an overly strict requirement on the OneVsOneClassifier decision function. (It was not strictly related in logic to our issues with the OneVsRestClassifier class, but @MarcBresson requested the change, so I made it, if that makes sense. 🙂) All CI tests are now passing; it should be ready to merge, I think, unless you have any further feedback?

Luis-Varona · 2025-05-06T06:29:46Z

@MarcBresson @jeremiedbb, just wanted to follow up and see if there are any more changes you’d like me to make before merging 🙂

MarcBresson · 2025-05-06T07:11:38Z

Hello @Luis-Varona , I'm not a maintainer, just a happy contributor :)

Reviews from the scikit-learn team can be quite long, as they have a lot on their plate. But don't worry, it will come!

Luis-Varona · 2025-05-06T14:48:59Z

Hello @Luis-Varona , I'm not a maintainer, just a happy contributor :)

Reviews from the scikit-learn team can be quite long, as they have a lot on their plate. But don't worry, it will come!

Got it, thanks :) Would you still like to discuss the rest of #31224, though?

Luis-Varona · 2025-05-07T18:52:48Z

@jeremiedbb Saw you added some commits; anything else you'd like me to do? 🙂

jeremiedbb

I just simplified a bit the test. LGTM. Thanks @Luis-Varona and @MarcBresson !

Luis-Varona mentioned this pull request Apr 20, 2025

OneVsRestClassifier when all estimators predict a sample belongs to the other classes #31224

Open

Luis-Varona changed the title ~~Fixed #31224 - Division by zero in OneVsRestClassifier's predict_proba method~~ Fixes #31224 - Division by zero in OneVsRestClassifier's predict_proba method Apr 20, 2025

jeremiedbb reviewed Apr 21, 2025

View reviewed changes

sklearn/multiclass.py Outdated Show resolved Hide resolved

Luis-Varona changed the title ~~Fixes #31224 - Division by zero in OneVsRestClassifier's predict_proba method~~ Division by zero in OneVsRestClassifier's predict_proba (tackles, but keeps open, #31224) Apr 21, 2025

Refactor div-by-zero fix; update changelog, tests

7005d2f

Refactored the division-by-zero fix in OneVsRest's predict_proba to use np.divide. Documented the change in the changelog. Added test_ovr_single_label_predict_proba_zero_row in tests to validate the bug fix.

Luis-Varona added 2 commits April 21, 2025 23:08

Merge branch 'main' into 31224-onevsrest-div-by-zero

94745ef

Reformatted code with ruff

4f9fc74

Luis-Varona requested review from jeremiedbb and MarcBresson April 23, 2025 13:43

Merge branch 'main' into 31224-onevsrest-div-by-zero

4891767

MarcBresson reviewed Apr 23, 2025

View reviewed changes

sklearn/tests/test_multiclass.py Outdated Show resolved Hide resolved

sklearn/tests/test_multiclass.py Outdated Show resolved Hide resolved

sklearn/tests/test_multiclass.py Outdated Show resolved Hide resolved

sklearn/tests/test_multiclass.py Outdated Show resolved Hide resolved

Removed leftover test; refactored zero indexing

caffd8d

Removed test (test_ovr_multilabel_predict_proba_zero_row) accidentally left over from prototyping. Changed zero indexing logic in test_ovr_single_label_predict_proba_zero_row to be more streamlined.

Luis-Varona requested a review from MarcBresson April 23, 2025 14:22

Luis-Varona added 2 commits April 25, 2025 20:28

Merge branch 'main' into 31224-onevsrest-div-by-zero

eb6f284

Luis-Varona changed the title ~~Division by zero in OneVsRestClassifier's predict_proba (tackles, but keeps open, #31224)~~ Division by zero in OneVsRest's predict_proba (see #31224); bugfix for test_ovo_decision_function Apr 26, 2025

Merge branch 'main' into 31224-onevsrest-div-by-zero

9c93701

jeremiedbb added 2 commits May 7, 2025 18:56

simplify test

a7b6e55

revert unrelated

a904988

jeremiedbb approved these changes May 7, 2025

View reviewed changes

jeremiedbb merged commit a5d7f9e into scikit-learn:main May 7, 2025
36 checks passed

Luis-Varona deleted the 31224-onevsrest-div-by-zero branch May 7, 2025 21:28

MarcBresson mentioned this pull request May 28, 2025

ENH: add option in OneVsRest classifier to handle undefined predictions #31448

Open

Uh oh!

Division by zero in OneVsRest's predict_proba (see #31224); bugfix for test_ovo_decision_function #31228

Division by zero in OneVsRest's predict_proba (see #31224); bugfix for test_ovo_decision_function #31228

Uh oh!

Conversation

Luis-Varona commented Apr 20, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Apr 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Luis-Varona commented Apr 21, 2025

Uh oh!

Luis-Varona commented Apr 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarcBresson commented Apr 23, 2025

Uh oh!

Luis-Varona commented Apr 23, 2025

Uh oh!

Luis-Varona commented Apr 23, 2025

Uh oh!

Luis-Varona commented Apr 26, 2025

Uh oh!

Luis-Varona commented May 6, 2025

Uh oh!

MarcBresson commented May 6, 2025

Uh oh!

Luis-Varona commented May 6, 2025

Uh oh!

Luis-Varona commented May 7, 2025

Uh oh!

jeremiedbb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Division by zero in OneVsRest's `predict_proba` (see #31224); bugfix for `test_ovo_decision_function` #31228

Division by zero in OneVsRest's `predict_proba` (see #31224); bugfix for `test_ovo_decision_function` #31228

github-actions bot commented Apr 20, 2025 •

edited

Loading

jeremiedbb left a comment •

edited

Loading