-
-
Notifications
You must be signed in to change notification settings - Fork 26k
ENH Add zero_division
parameter for accuracy_score
#29213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Introduce the zero_division parameter to the accuracy_score function when y_true and y_pred are empty.
Revise the method for calculating the number of elements in y_pred and y_true.
Hi @Jaimin020, regarding the CodeCov failures: you need to add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I leave you some thoughts, that might be helpful for furthing this PR, @Jaimin020.
The type of this PR would not be Fix, but ENH and you need to add an entry into doc/whats_new/v1.6.rst
as well.
sklearn/metrics/_classification.py
Outdated
@@ -154,10 +154,16 @@ def _check_targets(y_true, y_pred): | |||
"y_pred": ["array-like", "sparse matrix"], | |||
"normalize": ["boolean"], | |||
"sample_weight": ["array-like", None], | |||
"zero_division": [ | |||
Options(Real, {0, 1}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have to include np.nan
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, though the input we would expect from users would not be the str "nan", but the float value np.nan, which you can include like this:
Options(Real, {0, 1}), | |
Options(Real, {0, 1, np.nan}), |
Then you need to delete the "nan" in li. 159 again.
sklearn/metrics/_classification.py
Outdated
Sets the value to return when there is a zero division. | ||
|
||
Notes: | ||
- If set to "warn", this acts like 0, but a warning is also raised. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my ears "acts like 0" is not very clear. A suggestion:
- If set to "warn", this acts like 0, but a warning is also raised. | |
- If set to "warn", this behaves like a 0.0 input, but a warning is also | |
raised. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sklearn/metrics/_classification.py
Outdated
if len_y_true == 0 and len_y_pred == 0: | ||
score = _check_zero_division(zero_division) | ||
if zero_division == "warn": | ||
_warn_prf(None, "Predcited", "Accuracy is", 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_warn_prf(None, "Predcited", "Accuracy is", 0) | |
_warn_prf(None, "predicted", "`accuracy_score` is", 0) |
I think we should be precise and tell our users where exactly the warning origins from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Added test cases for the 'zero_division' parameter in the accuracy_score function. Additionally, minor bugs were fixed as noted by the reviewer, and 'doc/whats_new/v1.6.rst' has been updated.
Hii, @StefanieSenger I have made all the changes you mentioned. |
The modification is completed to pass all tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thank you, @Jaimin020.
I've made a little comment about the param validation of the np.nan value. Apart from that it looks good to me. I'm not a maintainer though. @glemaitre, maybe you want to have a look?
sklearn/metrics/_classification.py
Outdated
@@ -154,10 +154,16 @@ def _check_targets(y_true, y_pred): | |||
"y_pred": ["array-like", "sparse matrix"], | |||
"normalize": ["boolean"], | |||
"sample_weight": ["array-like", None], | |||
"zero_division": [ | |||
Options(Real, {0, 1}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, though the input we would expect from users would not be the str "nan", but the float value np.nan, which you can include like this:
Options(Real, {0, 1}), | |
Options(Real, {0, 1, np.nan}), |
Then you need to delete the "nan" in li. 159 again.
Hii, @StefanieSenger and @glemaitre I've implemented all the changes suggested by @StefanieSenger. @glemaitre, please review this PR and let me know if any further modifications are needed. |
zero_division
parameter for accuracy_score
sklearn/metrics/_classification.py
Outdated
@@ -154,10 +154,16 @@ def _check_targets(y_true, y_pred): | |||
"y_pred": ["array-like", "sparse matrix"], | |||
"normalize": ["boolean"], | |||
"sample_weight": ["array-like", None], | |||
"zero_division": [ | |||
Options(Real, {0, 1, np.nan}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Options(Real, {0, 1, np.nan}), | |
Options(Real, {0.0, 1.0, np.nan}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/whats_new/v1.6.rst
Outdated
:mod:`sklearn.metrics` | ||
.............................. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:mod:`sklearn.metrics` | |
.............................. | |
:mod:`sklearn.metrics` | |
...................... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sklearn/metrics/_classification.py
Outdated
# Check y_true and y_pred is empty | ||
len_y_true = _num_samples(y_true) | ||
len_y_pred = _num_samples(y_pred) | ||
|
||
if len_y_true == 0 and len_y_pred == 0: | ||
score = _check_zero_division(zero_division) | ||
if zero_division == "warn": | ||
_warn_prf(None, "predicted", "accuracy_score is", 0) | ||
return score | ||
|
||
# Compute accuracy for each possible representation | ||
y_type, y_true, y_pred = _check_targets(y_true, y_pred) | ||
check_consistent_length(y_true, y_pred, sample_weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should first do the checking of the vectors to raise a proper error. Then, we have only to check a single vector because we already check for consistency between vector. I would also avoid to use the _warn_prf
because the name is really linked to precision-recall-fscore.
So let just warn directly.
# Check y_true and y_pred is empty | |
len_y_true = _num_samples(y_true) | |
len_y_pred = _num_samples(y_pred) | |
if len_y_true == 0 and len_y_pred == 0: | |
score = _check_zero_division(zero_division) | |
if zero_division == "warn": | |
_warn_prf(None, "predicted", "accuracy_score is", 0) | |
return score | |
# Compute accuracy for each possible representation | |
y_type, y_true, y_pred = _check_targets(y_true, y_pred) | |
check_consistent_length(y_true, y_pred, sample_weight) | |
# Compute accuracy for each possible representation | |
y_type, y_true, y_pred = _check_targets(y_true, y_pred) | |
check_consistent_length(y_true, y_pred, sample_weight) | |
if len(y_true) == 0: # empty vectors | |
if zero_division == "warn": | |
msg = ( | |
f"accuracy() is ill-defined and set to 0.0. Use the `zero_division` " | |
"param to control this behavior." | |
) | |
warnings.warn(msg, UndefinedMetricWarning) | |
return _check_zero_division(zero_division) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sklearn/metrics/_classification.py
Outdated
Sets the value to return when there is a zero division. | ||
|
||
Notes: | ||
- If set to "warn", this behaves like a 0.0 input, but a warning is also | ||
raised. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sets the value to return when there is a zero division. | |
Notes: | |
- If set to "warn", this behaves like a 0.0 input, but a warning is also | |
raised. | |
Sets the value to return when there is a zero division, e.g. when `y_true` and `y_pred` | |
are empty. If set to "warn", returns 0.0 input, but a warning is also raised. | |
.. versionadded:: 1.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -215,6 +215,29 @@ def test_classification_report_zero_division_warning(zero_division): | |||
assert not record | |||
|
|||
|
|||
def test_accuracy_score_zero_division_warning(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the test_zero_division_nan_no_warning
and test_zero_division_nan_warning
that already exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have revised the test case according to your suggestions. Please review it.
All modifications done.
Hii @glemaitre, I have implemented all the changes you requested. Please review it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @Jaimin020.
I quickly pushed a commit to merge main in the branch and just modify the message related to skipping the test that is rather a nitpick.
We will need a second approval.
Thanks @glemaitre for the quick update and merge with the main branch. |
Introduce the zero_division parameter to the accuracy_score function when y_true and y_pred are empty.
Reference Issues/PRs
Make zero_division parameter consistent in the different metric #29048 (Task-1)
What does this implement/fix? Explain your changes.
I have verified the lengths of the variables "y_true" and "y_pred." If both lengths are 0, I have generated an output based on the value of the "zero_warning" variable.
Any other comments?