Skip to content

ENH Add zero_division parameter for accuracy_score #29213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Oct 22, 2024

Conversation

Jaimin020
Copy link
Contributor

@Jaimin020 Jaimin020 commented Jun 7, 2024

Introduce the zero_division parameter to the accuracy_score function when y_true and y_pred are empty.

Reference Issues/PRs

Make zero_division parameter consistent in the different metric #29048 (Task-1)

What does this implement/fix? Explain your changes.

I have verified the lengths of the variables "y_true" and "y_pred." If both lengths are 0, I have generated an output based on the value of the "zero_warning" variable.

Any other comments?

Introduce the zero_division parameter to the accuracy_score function when y_true and y_pred are empty.
Copy link

github-actions bot commented Jun 7, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: f155899. Link to the linter CI: here

@Jaimin020 Jaimin020 changed the title Addition of "zero_division" Addition of "zero_division for accuracy_score Jun 7, 2024
@Jaimin020 Jaimin020 changed the title Addition of "zero_division for accuracy_score Addition of "zero_division" for accuracy_score Jun 7, 2024
Jaimin020 added 3 commits June 7, 2024 23:28
Revise the method for calculating the number of elements in y_pred and y_true.
@Jaimin020 Jaimin020 changed the title Addition of "zero_division" for accuracy_score Fix: Addition of "zero_division" for accuracy_score Jun 8, 2024
@StefanieSenger
Copy link
Contributor

Hi @Jaimin020,

regarding the CodeCov failures: you need to add accuracy_score to the tests in test_classification.py. The output of the CI seems not very helpful here however. Normally, the precise lines of the uncovered code should show up in the "Files changed" tab.

Copy link
Contributor

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave you some thoughts, that might be helpful for furthing this PR, @Jaimin020.

The type of this PR would not be Fix, but ENH and you need to add an entry into doc/whats_new/v1.6.rst as well.

@@ -154,10 +154,16 @@ def _check_targets(y_true, y_pred):
"y_pred": ["array-like", "sparse matrix"],
"normalize": ["boolean"],
"sample_weight": ["array-like", None],
"zero_division": [
Options(Real, {0, 1}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have to include np.nan here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, though the input we would expect from users would not be the str "nan", but the float value np.nan, which you can include like this:

Suggested change
Options(Real, {0, 1}),
Options(Real, {0, 1, np.nan}),

Then you need to delete the "nan" in li. 159 again.

Sets the value to return when there is a zero division.

Notes:
- If set to "warn", this acts like 0, but a warning is also raised.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my ears "acts like 0" is not very clear. A suggestion:

Suggested change
- If set to "warn", this acts like 0, but a warning is also raised.
- If set to "warn", this behaves like a 0.0 input, but a warning is also
raised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if len_y_true == 0 and len_y_pred == 0:
score = _check_zero_division(zero_division)
if zero_division == "warn":
_warn_prf(None, "Predcited", "Accuracy is", 0)
Copy link
Contributor

@StefanieSenger StefanieSenger Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_warn_prf(None, "Predcited", "Accuracy is", 0)
_warn_prf(None, "predicted", "`accuracy_score` is", 0)

I think we should be precise and tell our users where exactly the warning origins from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Jaimin020 Jaimin020 changed the title Fix: Addition of "zero_division" for accuracy_score ENH: Addition of "zero_division" for accuracy_score Jun 10, 2024
Added test cases for the 'zero_division' parameter in the accuracy_score function. Additionally, minor bugs were fixed as noted by the reviewer, and 'doc/whats_new/v1.6.rst' has been updated.
@Jaimin020
Copy link
Contributor Author

Jaimin020 commented Jun 10, 2024

Hii, @StefanieSenger

I have made all the changes you mentioned.

@Jaimin020 Jaimin020 changed the title ENH: Addition of "zero_division" for accuracy_score ENH: Addition of "zero_division" parameter for accuracy_score Jun 10, 2024
The modification is completed to pass all tests.
Copy link
Contributor

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you, @Jaimin020.
I've made a little comment about the param validation of the np.nan value. Apart from that it looks good to me. I'm not a maintainer though. @glemaitre, maybe you want to have a look?

@@ -154,10 +154,16 @@ def _check_targets(y_true, y_pred):
"y_pred": ["array-like", "sparse matrix"],
"normalize": ["boolean"],
"sample_weight": ["array-like", None],
"zero_division": [
Options(Real, {0, 1}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, though the input we would expect from users would not be the str "nan", but the float value np.nan, which you can include like this:

Suggested change
Options(Real, {0, 1}),
Options(Real, {0, 1, np.nan}),

Then you need to delete the "nan" in li. 159 again.

@Jaimin020
Copy link
Contributor Author

Hii, @StefanieSenger and @glemaitre

I've implemented all the changes suggested by @StefanieSenger. @glemaitre, please review this PR and let me know if any further modifications are needed.

@glemaitre glemaitre self-requested a review June 12, 2024 15:22
@glemaitre glemaitre changed the title ENH: Addition of "zero_division" parameter for accuracy_score ENH Add zero_division parameter for accuracy_score Jun 12, 2024
@glemaitre glemaitre added this to the 1.6 milestone Jun 13, 2024
@@ -154,10 +154,16 @@ def _check_targets(y_true, y_pred):
"y_pred": ["array-like", "sparse matrix"],
"normalize": ["boolean"],
"sample_weight": ["array-like", None],
"zero_division": [
Options(Real, {0, 1, np.nan}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Options(Real, {0, 1, np.nan}),
Options(Real, {0.0, 1.0, np.nan}),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 141 to 142
:mod:`sklearn.metrics`
..............................
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:mod:`sklearn.metrics`
..............................
:mod:`sklearn.metrics`
......................

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 234 to 246
# Check y_true and y_pred is empty
len_y_true = _num_samples(y_true)
len_y_pred = _num_samples(y_pred)

if len_y_true == 0 and len_y_pred == 0:
score = _check_zero_division(zero_division)
if zero_division == "warn":
_warn_prf(None, "predicted", "accuracy_score is", 0)
return score

# Compute accuracy for each possible representation
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
check_consistent_length(y_true, y_pred, sample_weight)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should first do the checking of the vectors to raise a proper error. Then, we have only to check a single vector because we already check for consistency between vector. I would also avoid to use the _warn_prf because the name is really linked to precision-recall-fscore.

So let just warn directly.

Suggested change
# Check y_true and y_pred is empty
len_y_true = _num_samples(y_true)
len_y_pred = _num_samples(y_pred)
if len_y_true == 0 and len_y_pred == 0:
score = _check_zero_division(zero_division)
if zero_division == "warn":
_warn_prf(None, "predicted", "accuracy_score is", 0)
return score
# Compute accuracy for each possible representation
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
check_consistent_length(y_true, y_pred, sample_weight)
# Compute accuracy for each possible representation
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
check_consistent_length(y_true, y_pred, sample_weight)
if len(y_true) == 0: # empty vectors
if zero_division == "warn":
msg = (
f"accuracy() is ill-defined and set to 0.0. Use the `zero_division` "
"param to control this behavior."
)
warnings.warn(msg, UndefinedMetricWarning)
return _check_zero_division(zero_division)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 191 to 195
Sets the value to return when there is a zero division.

Notes:
- If set to "warn", this behaves like a 0.0 input, but a warning is also
raised.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Sets the value to return when there is a zero division.
Notes:
- If set to "warn", this behaves like a 0.0 input, but a warning is also
raised.
Sets the value to return when there is a zero division, e.g. when `y_true` and `y_pred`
are empty. If set to "warn", returns 0.0 input, but a warning is also raised.
.. versionadded:: 1.6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -215,6 +215,29 @@ def test_classification_report_zero_division_warning(zero_division):
assert not record


def test_accuracy_score_zero_division_warning():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the test_zero_division_nan_no_warning and test_zero_division_nan_warning that already exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have revised the test case according to your suggestions. Please review it.

All modifications done.
@Jaimin020
Copy link
Contributor Author

Hii @glemaitre,

I have implemented all the changes you requested. Please review it.

@glemaitre glemaitre self-requested a review July 22, 2024 08:15
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @Jaimin020.

I quickly pushed a commit to merge main in the branch and just modify the message related to skipping the test that is rather a nitpick.

We will need a second approval.

@Jaimin020
Copy link
Contributor Author

LGTM @Jaimin020.

I quickly pushed a commit to merge main in the branch and just modify the message related to skipping the test that is rather a nitpick.

We will need a second approval.

Thanks @glemaitre for the quick update and merge with the main branch.

@adrinjalali adrinjalali enabled auto-merge (squash) October 22, 2024 13:39
@adrinjalali adrinjalali merged commit 99f0f69 into scikit-learn:main Oct 22, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants