-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Inconsistent return type in statistics.median_grouped() #92531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
…-92531 (pythonGH-92533) (cherry picked from commit e01eeb7) Co-authored-by: Raymond Hettinger <rhettinger@users.noreply.github.com>
@rhettinger I'm not going to ask for this change to be reverted. I think that so long as the function does convert to float, you are right that it should do so consistently. But this function is not intended to always return floats. The documentation says: CPython implementation detail: Under some circumstances, median_grouped() may coerce data points to floats. This behaviour is likely to change in the future. The intent is that if your data is homogeneous Decimals or Fractions, it should honour those types, not coerce them into floats. (Obviously the function doesn't do that at the moment.) The problem is that the data stream (list, iterator etc) may not be homogeneous and I'm not sure what how to handle the case of heterogeneous data. The other statistics functions mostly try very hard to coerce heterogeneous types into a common type, but that has performance costs and I don't think it is appropriate for the median functions. I have some ideas regarding that, e.g. to single out the first data value as the canonical type, and then go on with assuming the rest of the data values are compatible. But until the behaviour changes, I concede that your change is an improvement. |
Except for the inconsistent special case, the code has always returned a float, regardless of input type for almost a decade. There was even a test to assure that was the case. The comment in the test indicate that the original implementers gave-up on trying to make median_grouped() preserve the input types. Since then, no one has touched the code or tests. Though it's risky at this point, the implementation note in the docs leaves the door open for change. To follow through with the original intent, make the following edit:
That gives a run like this:
If an inconsistent type is given for interval, then a TypeError is raised:
If the input data has inconsistent types, that is ignored. Only the value at the midpoint is used (as one would expect from a median calculation). |
The median_grouped() function erroneously returns inconsistent types in its special case path for a single argument. The function should always return a float.
The text was updated successfully, but these errors were encountered: