Skip to content

ENH Add replace_undefined_by to accuracy_score #31187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

StefanieSenger
Copy link
Contributor

@StefanieSenger StefanieSenger commented Apr 12, 2025

Reference Issues/PRs

towards #29048

What does this implement/fix? Explain your changes.

This PR adds a replace_undefined_by param to accuracy_score to deal with empty y_true and y_pred.
Also adds tests.

Open Question

Note that before this PR accuracy_score returned like this:
accuracy_score(np.array([]), np.array([]))

nan

accuracy_score(np.array([]), np.array([]), normalize=False)

0.0

I would like to consider this inconsistency as a bug and fix this with this PR for the next release without deprecation, so it comes faster. Would this be okay? How would you see that, @adrinjalali?

Copy link

github-actions bot commented Apr 12, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: a62e487. Link to the linter CI: here

@StefanieSenger StefanieSenger added this to the 1.7 milestone Apr 12, 2025
@StefanieSenger StefanieSenger marked this pull request as draft April 12, 2025 17:03
@StefanieSenger StefanieSenger marked this pull request as ready for review April 12, 2025 17:18
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits, otherwise LGTM.

thus ill-defined. Can take the following values:

- `np.nan` to return `np.nan`
- a floating point value in the range of [0.0, 1.0] or int 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- a floating point value in the range of [0.0, 1.0] or int 0
- a floating point value in the range of $[0.0, 1.0]$ or `int` 0

I think that makes it math like

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guillaume once told me we don't use latex in docstrings. So I will use back ticks instead.

"defaults to 0 when `normalize=False` is set."
)
warnings.warn(msg, UndefinedMetricWarning, stacklevel=2)
return replace_undefined_by if math.isnan(replace_undefined_by) else 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return replace_undefined_by if math.isnan(replace_undefined_by) else 0
return replace_undefined_by if np.isnan(replace_undefined_by) else 0

since we're using np.nan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both is possible, but we're replacing np.isnan with math.isnan for array api, so we can already do it here, I think.

@@ -3081,7 +3114,7 @@ def hamming_loss(y_true, y_pred, *, sample_weight=None):

Returns
-------
loss : float or int
loss : float
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there return types are changed. I agree it should be always float for all of them, but it'd be nice to have a test for all the cases to make sure it's actually float.

Copy link
Contributor Author

@StefanieSenger StefanieSenger Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do this in a separate PR.

These tests had already been added in #30575.

Copy link
Contributor Author

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review, @adrinjalali.

I have addressed all your comments. For testing if all the classification metrics indeed return floats, I will open a separate PR.

thus ill-defined. Can take the following values:

- `np.nan` to return `np.nan`
- a floating point value in the range of [0.0, 1.0] or int 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guillaume once told me we don't use latex in docstrings. So I will use back ticks instead.

"defaults to 0 when `normalize=False` is set."
)
warnings.warn(msg, UndefinedMetricWarning, stacklevel=2)
return replace_undefined_by if math.isnan(replace_undefined_by) else 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both is possible, but we're replacing np.isnan with math.isnan for array api, so we can already do it here, I think.

@@ -3081,7 +3114,7 @@ def hamming_loss(y_true, y_pred, *, sample_weight=None):

Returns
-------
loss : float or int
loss : float
Copy link
Contributor Author

@StefanieSenger StefanieSenger Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do this in a separate PR.

These tests had already been added in #30575.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants