-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
mean_squred_error giving wrong results #28827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
One could argue that about any dtype, since overflow can happen in any of them. So I think it's the user who needs to know what dtype makes sense for such an operation. Otherwise we'd need to upcast every input in every method to the highest available dtype / precision, and we can't do that. cc @ogrisel @betatim since this is similar to the case we had for the array API. Closing for now, happy to have it reopened if we think we should do something about it. |
Obviously it's your choice to make, but I want to highlight a few points:
I also want to highlight that for integer types the case is much worse as: In my opinion, as can be argued by points 3 and 4, it is reasonable to only upcast to floating-point dtype in calculations of metrics like |
Describe the bug
I have recently noticed a bug in the implementation of mean_squared_error in sklearn.metrics.
The current implementation of the function basically calculates the MSE as follows:
Which is reasonable in most cases, but may return wrong results in cases that the type of
y_true
andy_pred
has a low bit count, for examplenp.uint8
ranging from 0 to 254.The reason for that is that when doing the calculation using arrays of types like
np.uint8
, it is very likely that overflows will occur (which are not reported in any way!) resulting in wrong results.To resolve this
y_true
andy_pred
should first be casted to adtype
big enough so overflows will not occur with reasonable errors, such asfloat64
.For example:
Steps/Code to Reproduce
Expected Results
Expected result is 256 as (0 - 16)**2 = 256
Actual Results
The result of mean_squared_error is 0
Versions
The text was updated successfully, but these errors were encountered: