Division by zero in OneVsRest's `predict_proba` (see #31224); bugfix for `test_ovo_decision_function` #31228

Luis-Varona · 2025-04-20T01:29:00Z

When a classifier has zero confidence in a class, zero rows appear in the predicted probability matrix, resulting in undefined behavior during probability normalization. OneVsRestClassifier's predict_proba method attempts to divide by zero row sums:

        .... (precursor logic)
        if not self.multilabel_:
            # Then, probabilities should be normalized to 1.
            Y /= np.sum(Y, axis=1)[:, np.newaxis]
        return Y

This fixes that behavior by setting all zero row sums to 1:

        .... (precursor logic)
        if not self.multilabel_:
            # Then, (nonzero) probabilities should be normalized to 1.
            row_sums = np.sum(Y, axis=1)[:, np.newaxis]
            row_sums[row_sums == 0] = 1 # Avoid division by zero
            Y /= row_sums
        return Y

See Issue #31224 for more details.

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

When a classifier has zero confidence in a class, zero rows appear in the predicted probability matrix, resulting in undefined behavior during probability normalization. OneVsRestClassifier's predict_proba method attempts to divide by zero row sums. This fixes that behavior by setting all zero row sums to 1.

github-actions · 2025-04-20T01:29:55Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 9c93701. Link to the linter CI: here}

jeremiedbb

Thanks for the PR @Luis-Varona. Let's use numpy's divide function which allows to specify where to divide.

Please add a test in sklearn/tests/test_multiclass.py and an entry in the changelog (see https://github.com/scikit-learn/scikit-learn/blob/main/doc/whats_new/upcoming_changes/README.md).

sklearn/multiclass.py

Luis-Varona · 2025-04-21T13:56:56Z

Will do. 🙂

Refactored the division-by-zero fix in OneVsRest's predict_proba to use np.divide. Documented the change in the changelog. Added test_ovr_single_label_predict_proba_zero_row in tests to validate the bug fix.

Luis-Varona · 2025-04-22T00:27:27Z

@jeremiedbb @MarcBresson I've made the requested changes. (The only thing I'm worried about is whether test_ovr_single_label_predict_proba_zero_row is too long of a test function name; we already have test_ovr_single_label_predict_proba, and I wanted to extract the logic into a separate function for obvious reasons.) Let me know if everything is good 🙂

sklearn/tests/test_multiclass.py

Removed test (test_ovr_multilabel_predict_proba_zero_row) accidentally left over from prototyping. Changed zero indexing logic in test_ovr_single_label_predict_proba_zero_row to be more streamlined.

MarcBresson · 2025-04-23T14:35:14Z

There is one test failing in https://github.com/scikit-learn/scikit-learn/actions/runs/14620590689/job/41019410025?pr=31228

FAILED tests/test_multiclass.py::test_ovo_decision_function - assert 145 > 146

Luis-Varona · 2025-04-23T14:37:28Z

There is one test failing in https://github.com/scikit-learn/scikit-learn/actions/runs/14620590689/job/41019410025?pr=31228

FAILED tests/test_multiclass.py::test_ovo_decision_function - assert 145 > 146

@MarcBresson this is not my test; I didn't touch it. Presumably if we want to fix this, it should be in another Issue/PR?

Luis-Varona · 2025-04-23T15:23:10Z

@MarcBresson anything else to fix before merging? Do you really want to address the bug with test_ovo_decision_function in this PR, or just make another PR? (We touched neither the OneVsOneClassifier class nor the test_ovo_decision_function function in this PR, so personally I think we should relegate it to another bug fix Issue, but it's up to you 🙂)

Lowered the threshold of unique values in the iris dataset from >146 to >140 in the test_ovo_decision_function test of the test_multiclass suite. Previous CI tests ended up with 145 unique values, resulting in a failing 'assert 145 > 146' check.

Luis-Varona · 2025-04-26T00:35:14Z

@MarcBresson @jeremiedbb I fixed the test_ovo_decision_function check; it was an overly strict requirement on the OneVsOneClassifier decision function. (It was not strictly related in logic to our issues with the OneVsRestClassifier class, but @MarcBresson requested the change, so I made it, if that makes sense. 🙂) All CI tests are now passing; it should be ready to merge, I think, unless you have any further feedback?

Luis-Varona mentioned this pull request Apr 20, 2025

OneVsRestClassifier when all estimators predict a sample belongs to the other classes #31224

Open

Luis-Varona changed the title ~~Fixed #31224 - Division by zero in OneVsRestClassifier's predict_proba method~~ Fixes #31224 - Division by zero in OneVsRestClassifier's predict_proba method Apr 20, 2025

jeremiedbb reviewed Apr 21, 2025

View reviewed changes

sklearn/multiclass.py Outdated Show resolved Hide resolved

Luis-Varona changed the title ~~Fixes #31224 - Division by zero in OneVsRestClassifier's predict_proba method~~ Division by zero in OneVsRestClassifier's predict_proba (tackles, but keeps open, #31224) Apr 21, 2025

Refactor div-by-zero fix; update changelog, tests

7005d2f

Refactored the division-by-zero fix in OneVsRest's predict_proba to use np.divide. Documented the change in the changelog. Added test_ovr_single_label_predict_proba_zero_row in tests to validate the bug fix.

Luis-Varona added 2 commits April 21, 2025 23:08

Merge branch 'main' into 31224-onevsrest-div-by-zero

94745ef

Reformatted code with ruff

4f9fc74

Luis-Varona requested review from jeremiedbb and MarcBresson April 23, 2025 13:43

Merge branch 'main' into 31224-onevsrest-div-by-zero

4891767

MarcBresson reviewed Apr 23, 2025

View reviewed changes

sklearn/tests/test_multiclass.py Outdated Show resolved Hide resolved

sklearn/tests/test_multiclass.py Outdated Show resolved Hide resolved

sklearn/tests/test_multiclass.py Show resolved Hide resolved

sklearn/tests/test_multiclass.py Outdated Show resolved Hide resolved

Removed leftover test; refactored zero indexing

caffd8d

Removed test (test_ovr_multilabel_predict_proba_zero_row) accidentally left over from prototyping. Changed zero indexing logic in test_ovr_single_label_predict_proba_zero_row to be more streamlined.

Luis-Varona requested a review from MarcBresson April 23, 2025 14:22

Luis-Varona added 2 commits April 25, 2025 20:28

Merge branch 'main' into 31224-onevsrest-div-by-zero

eb6f284

Luis-Varona changed the title ~~Division by zero in OneVsRestClassifier's predict_proba (tackles, but keeps open, #31224)~~ Division by zero in OneVsRest's predict_proba (see #31224); bugfix for test_ovo_decision_function Apr 26, 2025

Merge branch 'main' into 31224-onevsrest-div-by-zero

9c93701

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Division by zero in OneVsRest's `predict_proba` (see #31224); bugfix for `test_ovo_decision_function` #31228

Division by zero in OneVsRest's `predict_proba` (see #31224); bugfix for `test_ovo_decision_function` #31228

Luis-Varona commented Apr 20, 2025

github-actions bot commented Apr 20, 2025 •

edited

Loading

jeremiedbb left a comment

Luis-Varona commented Apr 21, 2025

Luis-Varona commented Apr 22, 2025

MarcBresson commented Apr 23, 2025

Luis-Varona commented Apr 23, 2025

Luis-Varona commented Apr 23, 2025

Luis-Varona commented Apr 26, 2025

Division by zero in OneVsRest's predict_proba (see #31224); bugfix for test_ovo_decision_function #31228

Are you sure you want to change the base?

Division by zero in OneVsRest's predict_proba (see #31224); bugfix for test_ovo_decision_function #31228

Conversation

Luis-Varona commented Apr 20, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Apr 20, 2025 • edited Loading

✔️ Linting Passed

jeremiedbb left a comment

Choose a reason for hiding this comment

Luis-Varona commented Apr 21, 2025

Luis-Varona commented Apr 22, 2025

MarcBresson commented Apr 23, 2025

Luis-Varona commented Apr 23, 2025

Luis-Varona commented Apr 23, 2025

Luis-Varona commented Apr 26, 2025

Division by zero in OneVsRest's `predict_proba` (see #31224); bugfix for `test_ovo_decision_function` #31228

Division by zero in OneVsRest's `predict_proba` (see #31224); bugfix for `test_ovo_decision_function` #31228

github-actions bot commented Apr 20, 2025 •

edited

Loading