Skip to content

Make scorers return python floats #30575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 3, 2025

Conversation

jeremiedbb
Copy link
Member

closes #27339

Copy link

github-actions bot commented Jan 3, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: d7f57b6. Link to the linter CI: here

@jeremiedbb jeremiedbb marked this pull request as ready for review January 22, 2025 14:39
@jeremiedbb jeremiedbb changed the title [WIP] Make scorers return python floats Make scorers return python floats Jan 22, 2025
@jeremiedbb
Copy link
Member Author

I've used the existing common tests for metrics. When doing that I found that it's not very intuitive and lacks a check to make sure we're not forgetting any (spoiler: we are). I might take a shot at modernizing it a bit like we did for the common tests but in a later PR.

Since there are already many occurrences where we convert to float and it was done without changelog entry, I think it's fine to not add a changelog entry for the remaing ones. It's just a difference in the repr after all.

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @jeremiedbb

@OmarManzoor OmarManzoor added the Waiting for Second Reviewer First reviewer is done, need a second one! label Jan 27, 2025
@@ -2732,7 +2732,7 @@ Here is a small example of usage of the :func:`max_error` function::
>>> y_true = [3, 2, 7, 1]
>>> y_pred = [9, 2, 7, 1]
>>> max_error(y_true, y_pred)
np.int64(6)
6.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change from a numpy scalar with an int dtype to a Python float. This may be a bit more suprising than from a numpy scalar with a float dtype to a Python float.

I spent 5 minutes trying to find a way it could have unintended side-effects but I could not find anything. Maybe somebody else wants to think about it during 5 minutes as well?

For example on main

from sklearn.metrics import max_error
1 / max_error([1, 2], [3, 5])

returns np.float64(0.3333333333333333) on main and 0.3333 with this PR.

I seem to remember there were some differences in numpy scalar handling in numpy 2.0 maybe worth a look about what happens with numpy < 2 ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a reason to not change this (or change to using int/float as appropriate?) is that if you feed integers to max_error you could be expecting to get back integers. A bit like 3 - 1 == 2. But maybe we are being too detailed orientated?

I think for very large integers you have to be careful when converting to floats as some of them can't be represented as float.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is arguably a fix for an inconsistency with the rest of the code base. max_error doesn't always return an int. It happens that if both y_pred and y_true have an int dtype, it returns an int.

We don't enforce this behavior for any other scorer when the usual return type is float but could be an int in some specific setting. For instance, accuracy_score(normalize=False) counts the number of correctly classified samples, yet still returns a float.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the docstring of max_error already states that the return type is float :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well then, no need for an exception :)

Co-authored-by: Tim Head <betatim@gmail.com>
@lesteve
Copy link
Member

lesteve commented Feb 3, 2025

2 approvals already, let's merge this!

@lesteve lesteve merged commit 39cc03f into scikit-learn:main Feb 3, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC should the scikit-learn metrics return a Python scalar or a NumPy scalar?
4 participants