-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Division by zero in OneVsRest's predict_proba
(see #31224); bugfix for test_ovo_decision_function
#31228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Division by zero in OneVsRest's predict_proba
(see #31224); bugfix for test_ovo_decision_function
#31228
Conversation
When a classifier has zero confidence in a class, zero rows appear in the predicted probability matrix, resulting in undefined behavior during probability normalization. OneVsRestClassifier's predict_proba method attempts to divide by zero row sums. This fixes that behavior by setting all zero row sums to 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @Luis-Varona. Let's use numpy's divide function which allows to specify where to divide.
Please add a test in sklearn/tests/test_multiclass.py
and an entry in the changelog (see https://github.com/scikit-learn/scikit-learn/blob/main/doc/whats_new/upcoming_changes/README.md).
Will do. 🙂 |
predict_proba
(tackles, but keeps open, #31224)
Refactored the division-by-zero fix in OneVsRest's predict_proba to use np.divide. Documented the change in the changelog. Added test_ovr_single_label_predict_proba_zero_row in tests to validate the bug fix.
@jeremiedbb @MarcBresson I've made the requested changes. (The only thing I'm worried about is whether |
Removed test (test_ovr_multilabel_predict_proba_zero_row) accidentally left over from prototyping. Changed zero indexing logic in test_ovr_single_label_predict_proba_zero_row to be more streamlined.
There is one test failing in https://github.com/scikit-learn/scikit-learn/actions/runs/14620590689/job/41019410025?pr=31228
|
@MarcBresson this is not my test; I didn't touch it. Presumably if we want to fix this, it should be in another Issue/PR? |
@MarcBresson anything else to fix before merging? Do you really want to address the bug with |
Lowered the threshold of unique values in the iris dataset from >146 to >140 in the test_ovo_decision_function test of the test_multiclass suite. Previous CI tests ended up with 145 unique values, resulting in a failing 'assert 145 > 146' check.
@MarcBresson @jeremiedbb I fixed the |
predict_proba
(tackles, but keeps open, #31224)predict_proba
(see #31224); bugfix for test_ovo_decision_function
When a classifier has zero confidence in a class, zero rows appear in the predicted probability matrix, resulting in undefined behavior during probability normalization. OneVsRestClassifier's predict_proba method attempts to divide by zero row sums:
This fixes that behavior by setting all zero row sums to 1:
See Issue #31224 for more details.
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Any other comments?