-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Add replace_undefined_by to accuracy_score #31187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ENH Add replace_undefined_by to accuracy_score #31187
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits, otherwise LGTM.
sklearn/metrics/_classification.py
Outdated
thus ill-defined. Can take the following values: | ||
|
||
- `np.nan` to return `np.nan` | ||
- a floating point value in the range of [0.0, 1.0] or int 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- a floating point value in the range of [0.0, 1.0] or int 0 | |
- a floating point value in the range of $[0.0, 1.0]$ or `int` 0 |
I think that makes it math like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guillaume once told me we don't use latex in docstrings. So I will use back ticks instead.
sklearn/metrics/_classification.py
Outdated
"defaults to 0 when `normalize=False` is set." | ||
) | ||
warnings.warn(msg, UndefinedMetricWarning, stacklevel=2) | ||
return replace_undefined_by if math.isnan(replace_undefined_by) else 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return replace_undefined_by if math.isnan(replace_undefined_by) else 0 | |
return replace_undefined_by if np.isnan(replace_undefined_by) else 0 |
since we're using np.nan
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both is possible, but we're replacing np.isnan
with math.isnan
for array api, so we can already do it here, I think.
@@ -3081,7 +3114,7 @@ def hamming_loss(y_true, y_pred, *, sample_weight=None): | |||
|
|||
Returns | |||
------- | |||
loss : float or int | |||
loss : float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there return types are changed. I agree it should be always float for all of them, but it'd be nice to have a test for all the cases to make sure it's actually float.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do this in a separate PR.
These tests had already been added in #30575.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review, @adrinjalali.
I have addressed all your comments. For testing if all the classification metrics indeed return floats, I will open a separate PR.
sklearn/metrics/_classification.py
Outdated
thus ill-defined. Can take the following values: | ||
|
||
- `np.nan` to return `np.nan` | ||
- a floating point value in the range of [0.0, 1.0] or int 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guillaume once told me we don't use latex in docstrings. So I will use back ticks instead.
sklearn/metrics/_classification.py
Outdated
"defaults to 0 when `normalize=False` is set." | ||
) | ||
warnings.warn(msg, UndefinedMetricWarning, stacklevel=2) | ||
return replace_undefined_by if math.isnan(replace_undefined_by) else 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both is possible, but we're replacing np.isnan
with math.isnan
for array api, so we can already do it here, I think.
@@ -3081,7 +3114,7 @@ def hamming_loss(y_true, y_pred, *, sample_weight=None): | |||
|
|||
Returns | |||
------- | |||
loss : float or int | |||
loss : float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do this in a separate PR.
These tests had already been added in #30575.
…scikit-learn into undefined_accuracy_score
Reference Issues/PRs
towards #29048
What does this implement/fix? Explain your changes.
This PR adds a
replace_undefined_by
param toaccuracy_score
to deal with empty y_true and y_pred.Also adds tests.
Open Question
Note that before this PR
accuracy_score
returned like this:accuracy_score(np.array([]), np.array([]))
accuracy_score(np.array([]), np.array([]), normalize=False)
I would like to consider this inconsistency as a bug and fix this with this PR for the next release without deprecation, so it comes faster. Would this be okay? How would you see that, @adrinjalali?