-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Make scorers return python floats #30575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I've used the existing common tests for metrics. When doing that I found that it's not very intuitive and lacks a check to make sure we're not forgetting any (spoiler: we are). I might take a shot at modernizing it a bit like we did for the common tests but in a later PR. Since there are already many occurrences where we convert to float and it was done without changelog entry, I think it's fine to not add a changelog entry for the remaing ones. It's just a difference in the repr after all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @jeremiedbb
@@ -2732,7 +2732,7 @@ Here is a small example of usage of the :func:`max_error` function:: | |||
>>> y_true = [3, 2, 7, 1] | |||
>>> y_pred = [9, 2, 7, 1] | |||
>>> max_error(y_true, y_pred) | |||
np.int64(6) | |||
6.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a change from a numpy scalar with an int dtype to a Python float. This may be a bit more suprising than from a numpy scalar with a float dtype to a Python float.
I spent 5 minutes trying to find a way it could have unintended side-effects but I could not find anything. Maybe somebody else wants to think about it during 5 minutes as well?
For example on main
from sklearn.metrics import max_error
1 / max_error([1, 2], [3, 5])
returns np.float64(0.3333333333333333)
on main
and 0.3333
with this PR.
I seem to remember there were some differences in numpy scalar handling in numpy 2.0 maybe worth a look about what happens with numpy < 2
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a reason to not change this (or change to using int
/float
as appropriate?) is that if you feed integers to max_error
you could be expecting to get back integers. A bit like 3 - 1 == 2
. But maybe we are being too detailed orientated?
I think for very large integers you have to be careful when converting to floats as some of them can't be represented as float.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is arguably a fix for an inconsistency with the rest of the code base. max_error
doesn't always return an int. It happens that if both y_pred and y_true have an int dtype, it returns an int.
We don't enforce this behavior for any other scorer when the usual return type is float but could be an int in some specific setting. For instance, accuracy_score(normalize=False)
counts the number of correctly classified samples, yet still returns a float.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the docstring of max_error
already states that the return type is float :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well then, no need for an exception :)
Co-authored-by: Tim Head <betatim@gmail.com>
2 approvals already, let's merge this! |
closes #27339