Skip to content

Division by zero in OneVsRest's predict_proba (see #31224); bugfix for test_ovo_decision_function #31228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Luis-Varona
Copy link

When a classifier has zero confidence in a class, zero rows appear in the predicted probability matrix, resulting in undefined behavior during probability normalization. OneVsRestClassifier's predict_proba method attempts to divide by zero row sums:

        .... (precursor logic)
        if not self.multilabel_:
            # Then, probabilities should be normalized to 1.
            Y /= np.sum(Y, axis=1)[:, np.newaxis]
        return Y

This fixes that behavior by setting all zero row sums to 1:

        .... (precursor logic)
        if not self.multilabel_:
            # Then, (nonzero) probabilities should be normalized to 1.
            row_sums = np.sum(Y, axis=1)[:, np.newaxis]
            row_sums[row_sums == 0] = 1 # Avoid division by zero
            Y /= row_sums
        return Y

See Issue #31224 for more details.

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

When a classifier has zero confidence in a class, zero rows appear in the predicted probability matrix, resulting in undefined behavior during probability normalization. OneVsRestClassifier's predict_proba method attempts to divide by zero row sums. This fixes that behavior by setting all zero row sums to 1.
Copy link

github-actions bot commented Apr 20, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 9c93701. Link to the linter CI: here

@Luis-Varona Luis-Varona changed the title Fixed #31224 - Division by zero in OneVsRestClassifier's predict_proba method Fixes #31224 - Division by zero in OneVsRestClassifier's predict_proba method Apr 20, 2025
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Luis-Varona. Let's use numpy's divide function which allows to specify where to divide.

Please add a test in sklearn/tests/test_multiclass.py and an entry in the changelog (see https://github.com/scikit-learn/scikit-learn/blob/main/doc/whats_new/upcoming_changes/README.md).

@Luis-Varona
Copy link
Author

Will do. 🙂

@Luis-Varona Luis-Varona changed the title Fixes #31224 - Division by zero in OneVsRestClassifier's predict_proba method Division by zero in OneVsRestClassifier's predict_proba (tackles, but keeps open, #31224) Apr 21, 2025
Refactored the division-by-zero fix in OneVsRest's predict_proba to use np.divide. Documented the change in the changelog. Added test_ovr_single_label_predict_proba_zero_row in tests to validate the bug fix.
@Luis-Varona
Copy link
Author

@jeremiedbb @MarcBresson I've made the requested changes. (The only thing I'm worried about is whether test_ovr_single_label_predict_proba_zero_row is too long of a test function name; we already have test_ovr_single_label_predict_proba, and I wanted to extract the logic into a separate function for obvious reasons.) Let me know if everything is good 🙂

Removed test (test_ovr_multilabel_predict_proba_zero_row) accidentally left over from prototyping. Changed zero indexing logic in test_ovr_single_label_predict_proba_zero_row to be more streamlined.
@Luis-Varona Luis-Varona requested a review from MarcBresson April 23, 2025 14:22
@MarcBresson
Copy link
Contributor

There is one test failing in https://github.com/scikit-learn/scikit-learn/actions/runs/14620590689/job/41019410025?pr=31228

FAILED tests/test_multiclass.py::test_ovo_decision_function - assert 145 > 146

@Luis-Varona
Copy link
Author

There is one test failing in https://github.com/scikit-learn/scikit-learn/actions/runs/14620590689/job/41019410025?pr=31228

FAILED tests/test_multiclass.py::test_ovo_decision_function - assert 145 > 146

@MarcBresson this is not my test; I didn't touch it. Presumably if we want to fix this, it should be in another Issue/PR?

@Luis-Varona
Copy link
Author

@MarcBresson anything else to fix before merging? Do you really want to address the bug with test_ovo_decision_function in this PR, or just make another PR? (We touched neither the OneVsOneClassifier class nor the test_ovo_decision_function function in this PR, so personally I think we should relegate it to another bug fix Issue, but it's up to you 🙂)

Lowered the threshold of unique values in the iris dataset from >146 to >140 in the test_ovo_decision_function test of the test_multiclass suite. Previous CI tests ended up with 145 unique values, resulting in a failing 'assert 145 > 146' check.
@Luis-Varona
Copy link
Author

@MarcBresson @jeremiedbb I fixed the test_ovo_decision_function check; it was an overly strict requirement on the OneVsOneClassifier decision function. (It was not strictly related in logic to our issues with the OneVsRestClassifier class, but @MarcBresson requested the change, so I made it, if that makes sense. 🙂) All CI tests are now passing; it should be ready to merge, I think, unless you have any further feedback?

@Luis-Varona Luis-Varona changed the title Division by zero in OneVsRestClassifier's predict_proba (tackles, but keeps open, #31224) Division by zero in OneVsRest's predict_proba (see #31224); bugfix for test_ovo_decision_function Apr 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants