FIX Raise on empty inputs in accuracy_score #31187

StefanieSenger · 2025-04-12T15:03:15Z

Reference Issues/PRs

towards #29048

What does this implement/fix? Explain your changes.

This PR adds a replace_undefined_by param to accuracy_score to deal with empty y_true and y_pred.
Also adds tests.

Open Question

Note that before this PR accuracy_score returned like this:
accuracy_score(np.array([]), np.array([]))

nan

accuracy_score(np.array([]), np.array([]), normalize=False)

0.0

I would like to consider this inconsistency as a bug and fix this with this PR for the next release without deprecation, so it comes faster. Would this be okay? How would you see that, @adrinjalali?

github-actions · 2025-04-12T15:04:31Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: a530be7. Link to the linter CI: here}

sklearn/metrics/_classification.py

adrinjalali

nits, otherwise LGTM.

adrinjalali · 2025-04-23T13:57:47Z

sklearn/metrics/_classification.py

+        thus ill-defined. Can take the following values:
+
+        - `np.nan` to return `np.nan`
+        - a floating point value in the range of [0.0, 1.0] or int 0


Suggested change

- a floating point value in the range of [0.0, 1.0] or int 0

- a floating point value in the range of $[0.0, 1.0]$ or `int` 0

I think that makes it math like

Guillaume once told me we don't use latex in docstrings. So I will use back ticks instead.

sklearn/metrics/_classification.py

adrinjalali · 2025-04-23T14:00:26Z

sklearn/metrics/_classification.py

+                "defaults to 0 when `normalize=False` is set."
+            )
+            warnings.warn(msg, UndefinedMetricWarning, stacklevel=2)
+            return replace_undefined_by if math.isnan(replace_undefined_by) else 0


Suggested change

return replace_undefined_by if math.isnan(replace_undefined_by) else 0

return replace_undefined_by if np.isnan(replace_undefined_by) else 0

since we're using np.nan?

Both is possible, but we're replacing np.isnan with math.isnan for array api, so we can already do it here, I think.

sklearn/metrics/_classification.py

adrinjalali · 2025-04-23T14:09:02Z

sklearn/metrics/_classification.py

@@ -3081,7 +3114,7 @@ def hamming_loss(y_true, y_pred, *, sample_weight=None):

    Returns
    -------
-    loss : float or int
+    loss : float


there return types are changed. I agree it should be always float for all of them, but it'd be nice to have a test for all the cases to make sure it's actually float.

~~I will do this in a separate PR.~~

These tests had already been added in #30575.

StefanieSenger

Thank you for your review, @adrinjalali.

I have addressed all your comments. For testing if all the classification metrics indeed return floats, I will open a separate PR.

StefanieSenger · 2025-04-29T13:24:31Z

sklearn/metrics/_classification.py

+        thus ill-defined. Can take the following values:
+
+        - `np.nan` to return `np.nan`
+        - a floating point value in the range of [0.0, 1.0] or int 0


Guillaume once told me we don't use latex in docstrings. So I will use back ticks instead.

StefanieSenger · 2025-04-29T13:31:36Z

sklearn/metrics/_classification.py

+                "defaults to 0 when `normalize=False` is set."
+            )
+            warnings.warn(msg, UndefinedMetricWarning, stacklevel=2)
+            return replace_undefined_by if math.isnan(replace_undefined_by) else 0


Both is possible, but we're replacing np.isnan with math.isnan for array api, so we can already do it here, I think.

StefanieSenger · 2025-04-29T13:46:00Z

sklearn/metrics/_classification.py

@@ -3081,7 +3114,7 @@ def hamming_loss(y_true, y_pred, *, sample_weight=None):

    Returns
    -------
-    loss : float or int
+    loss : float


~~I will do this in a separate PR.~~

These tests had already been added in #30575.

…scikit-learn into undefined_accuracy_score

ogrisel · 2025-05-07T09:49:53Z

sklearn/metrics/_classification.py

@@ -360,6 +376,18 @@ def accuracy_score(y_true, y_pred, *, normalize=True, sample_weight=None):
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    check_consistent_length(y_true, y_pred, sample_weight)

+    if _num_samples(y_true) == 0:


As discussed during the dedicated meeting, we believe that the fact that accuracy_score does not raise ValueError when called on empty arrays was an oversight:

scikit-learn estimators always raise ValueError when fit on empty data,

more importantly, they always raise ValueError when predicting or transforming empty data, so it's not possible to get a working pipeline that would output an empty y_pred array;

many other metric functions such as roc_auc, f1_score and mean_squared_error already raise ValueError.

I think we can treat the accuracy_score case as a bug and make it raise ValueError instead (without deprecation cycle).

All other metric functions call check_array or a similar function that rejects empty arrays.

This is apparently not the case for _check_targets used in accuracy_score. We should review the list of calls to _check_targets to check whether we should change this function to:

reject empty arrays;

maybe even call check_array directly, but we would then need to check for redundant input validation to avoid introducing unnecessary overhead of redundant costly checks.

ogrisel · 2025-05-07T09:57:23Z

I remove the 1.7 milestone for this one because there is no emergency to fix the bug described above for 1.7.

StefanieSenger · 2025-05-07T10:01:49Z

Just to be sure: raising a ValueError on empty inputs also means that accuracy_score would not get a replace_undefined_by param, because empty inputs would be the only reason here that would create a division by zero.

ogrisel · 2025-05-13T12:06:56Z

Just to be sure: raising a ValueError on empty inputs also means that accuracy_score would not get a replace_undefined_by param, because empty inputs would be the only reason here that would create a division by zero.

Yes I agree there is no need for replace_undefined_by for this metric.

ENH Add replace_undefined_by to accuracy_score

4e48f15

github-actions bot added the module:metrics label Apr 12, 2025

changelog

e53157f

StefanieSenger commented Apr 12, 2025

View reviewed changes

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

fix typo

09f57bb

StefanieSenger added this to the 1.7 milestone Apr 12, 2025

StefanieSenger marked this pull request as draft April 12, 2025 17:03

StefanieSenger marked this pull request as ready for review April 12, 2025 17:18

StefanieSenger and others added 2 commits April 15, 2025 09:57

fix return types in docs

63d9f8e

Merge branch 'main' into undefined_accuracy_score

8e13b63

adrinjalali reviewed Apr 23, 2025

View reviewed changes

apply suggestions from code review

fcfd339

StefanieSenger commented Apr 29, 2025

View reviewed changes

StefanieSenger and others added 2 commits April 29, 2025 15:48

Merge branch 'undefined_accuracy_score' of github.com:StefanieSenger/…

a62e487

…scikit-learn into undefined_accuracy_score

Merge branch 'main' into undefined_accuracy_score

a530be7

adrinjalali mentioned this pull request May 6, 2025

Make zero_division parameter consistent in the different metric #29048

Open

5 tasks

ogrisel reviewed May 7, 2025

View reviewed changes

ogrisel removed this from the 1.7 milestone May 7, 2025

StefanieSenger changed the title ~~ENH Add replace_undefined_by to accuracy_score~~ FIX Raise on empty inputs in accuracy_score May 7, 2025

StefanieSenger marked this pull request as draft May 7, 2025 10:04

	- a floating point value in the range of [0.0, 1.0] or int 0
	- a floating point value in the range of $[0.0, 1.0]$ or `int` 0

	return replace_undefined_by if math.isnan(replace_undefined_by) else 0
	return replace_undefined_by if np.isnan(replace_undefined_by) else 0

Uh oh!

FIX Raise on empty inputs in accuracy_score #31187

Are you sure you want to change the base?

FIX Raise on empty inputs in accuracy_score #31187

Uh oh!

Conversation

StefanieSenger commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Open Question

Uh oh!

github-actions bot commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StefanieSenger Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StefanieSenger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StefanieSenger Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StefanieSenger commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented May 13, 2025

Uh oh!

Uh oh!

StefanieSenger commented Apr 12, 2025 •

edited

Loading

github-actions bot commented Apr 12, 2025 •

edited

Loading

StefanieSenger Apr 29, 2025 •

edited

Loading

StefanieSenger Apr 29, 2025 •

edited

Loading

ogrisel commented May 7, 2025 •

edited

Loading

StefanieSenger commented May 7, 2025 •

edited

Loading