-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: numpy.percentile output is not sorted #14685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Why would you expect it to be sorted? Percentile is elementwise - the outputs are in the order of the inputs. |
Hi ! |
There are some numerical errors within a single ULP, that differ for different inputs with the same output value. I doubt there is anything to be done about that. |
A slightly reduced failing case:
here showing non-sorted-ness via the diff. I think there probably is something we can do about this. I think this comes down to the stability of these lines, which perform a numpy/numpy/lib/function_base.py Lines 3907 to 3908 in b9fa88e
numpy/numpy/lib/function_base.py Lines 3928 to 3929 in b9fa88e
numpy/numpy/lib/function_base.py Lines 3939 to 3942 in b9fa88e
There are a bunch of tradeoffs to be made when linearly interpolating floating point values, but I suspect that there's a "correct" choice here, and we just haven't made it. Some more background here: https://math.stackexchange.com/questions/907327/accurate-floating-point-linear-interpolation |
Yeah, I agree, +1 on reorganizing the operations so that it is strictly monotonic (numerically). Would be good if it is also no worse, or at least almost identical precision wise. I am sure we really do not have to worry about a few extra operations/speed here. EDIT: Marked as good first issue. This is only a good first issue if you are willing to dive into the intricacies of IEEE floating point numbers. But after that, this is probably a fairly straight forward reorganization within python code. |
I would be interested in taking on this issue. I was looking at some of the failing cases and noticed that they all involved linearly interpolating between the same number. i.e. in Eric's example all of the percentiles he listed listed are located in between two 9s. Therefore I think the linear interpolation between them must be 9 exactly correct? fixing the problem of linearly interpolating between two number that are the same seems like it would deal with the issues presented in this bug and not cause a noticeable hit in performance. If however we want to ensure that the linear interpolation will be monotonic always, we can do that but It will require a piecewise function that I would think would decrease performance. |
@ngonzo95 there should be a way to spell the arithmetic of the interpolation differently to achieve this, i.e. change/rearrange the formula that is used for the calculation (so that it is mathematically identical, but numerically guarantees monotonicity). No piecewise calculation should be necessary. |
It depends what your requirements on
( |
Oh OK, I did not expect piecewise to be necessary, but do not know the intrinsicaties of this well enough I guess. |
looking into it more I discovered that the function a + (b-a)*t has the property of being both monotonic (definition noted above) and consistent (lerp (a, a, t) = a). I believe this should be sufficient for the functions requirements. It seems one of the main draw backs of this function is that lerp(a, b, 1) !=b. However I think the way we are calculating weights ensures that 0<=t<1. |
Note that unfortunately |
New to the open source. |
In scikit-learn, we recently stumbled in this issue: scikit-learn/scikit-learn#15733 Since we expect |
@glemaitre: All of the relevant lines in numpy are linked in my comment above, #14685 (comment) |
Hey, there seems to have been an update to one of the stackexchange answers provided by @eric-wieser with a good alternative interpolation. |
Note that there is another issue with lerp in |
The output of
numpy.percentile
is not always sortedReproducing code example:
Error message:
[ True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True False False True True True True False
True True True False]
AssertionError Traceback (most recent call last)
in
1 q = np.percentile(np.array([0, 1, 1, 2, 2, 3, 3 , 4, 5, 5, 1, 1, 9, 9 ,9, 8, 8, 7]) * 0.1, np.arange(0, 1, 0.01) * 100)
2 equals_sorted = np.sort(q) == q
----> 3 assert equals_sorted.all()
AssertionError:
Numpy/Python version information:
1.17.2 3.6.8 (v3.6.8:3c6b436a57, Dec 24 2018, 02:04:31)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
The text was updated successfully, but these errors were encountered: