Skip to content

Broken bool supports in NumPy's percentile #19154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
HyukjinKwon opened this issue Jun 3, 2021 · 3 comments
Closed

Broken bool supports in NumPy's percentile #19154

HyukjinKwon opened this issue Jun 3, 2021 · 3 comments
Labels
06 - Regression triaged Issue/PR that was discussed in a triage meeting

Comments

@HyukjinKwon
Copy link

#16273 (comment) broke the case of percentile with bools which causes to break pandas pandas' quantile too:

import pandas as pd
pd.DataFrame({"i": [0, 1, 2], "b": [False, False, True], "s": ["x", "y", "z"]}).quantile(q=0.5, numeric_only=True)

Before

i    1.0
b    0.0
Name: 0.5, dtype: float64

After

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../python3.8/site-packages/pandas/core/frame.py", line 9266, in quantile
    result = data._mgr.quantile(
  File "/.../python3.8/site-packages/pandas/core/internals/managers.py", line 491, in quantile
    block = b.quantile(axis=axis, qs=qs, interpolation=interpolation)
  File "/.../python3.8/site-packages/pandas/core/internals/blocks.py", line 1592, in quantile
    result = nanpercentile(
  File "/.../python3.8/site-packages/pandas/core/nanops.py", line 1675, in nanpercentile
    return np.percentile(values, q, axis=axis, interpolation=interpolation)
  File "<__array_function__ internals>", line 5, in percentile
  File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3818, in percentile
    return _quantile_unchecked(
  File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3937, in _quantile_unchecked
    r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
  File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3515, in _ureduce
    r = func(a, **kwargs)
  File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 4064, in _quantile_ureduce_func
    r = _lerp(x_below, x_above, weights_above, out=out)
  File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3961, in _lerp
    diff_b_a = subtract(b, a)
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

Reproducing code example:

import numpy as np
np.percentile([True, False, False], q=0.5)

Error message:

Before (NumPy 1.19)

0.0

After

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 5, in percentile
  File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3818, in percentile
    return _quantile_unchecked(
  File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3937, in _quantile_unchecked
    r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
  File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3515, in _ureduce
    r = func(a, **kwargs)
  File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 4064, in _quantile_ureduce_func
    r = _lerp(x_below, x_above, weights_above, out=out)
  File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3961, in _lerp
    diff_b_a = subtract(b, a)
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

NumPy/Python version information:

1.20.3 3.8.8 (default, Apr 13 2021, 12:59:45)
@seberg
Copy link
Member

seberg commented Jun 10, 2021

I am still not sure if this is actually a bad change, but it should not have happened without some warning. This was a regression in 1.20 already, though.

@seberg seberg added triage review Issue/PR to be discussed at the next triage meeting triaged Issue/PR that was discussed in a triage meeting and removed triage review Issue/PR to be discussed at the next triage meeting labels Jun 10, 2021
@seberg
Copy link
Member

seberg commented Jun 16, 2021

We discussed this briefly, and should try to fix this (assuming it is not too tricky). And then backport it to the 1.20.x (one more release is anticipated there).

That may include a Deprecation instead of the error though, since it is unclear that this is actually reasonable behaviour.

bzah added a commit to bzah/numpy that referenced this issue Sep 14, 2021
The work on this branch has removed a handling of an unnecessary specific case
(when indices are integers), which was covered by this test case.
However, percentile/quantiles have been broken for a while when the input array
is made of booleans, but it's not clear what should be done to fix this.

The unit test case now behaves like any other boolean array
and raise a TypeError.

See
- numpy#19857 (comment)
- numpy#19154
bzah added a commit to bzah/numpy that referenced this issue Oct 6, 2021
- Added the missing linear interpolation methods.
- Updated the existing unit tests.

- Added pytest.mark.xfail for boolean arrays
See
- numpy#19857 (comment)
- numpy#19154
bzah added a commit to bzah/numpy that referenced this issue Oct 6, 2021
- Added the missing linear interpolation methods.
- Updated the existing unit tests.

- Added pytest.mark.xfail for boolean arrays
See
- numpy#19857 (comment)
- numpy#19154
bzah added a commit to bzah/numpy that referenced this issue Oct 21, 2021
- Added the missing linear interpolation methods.
- Updated the existing unit tests.

- Added pytest.mark.xfail for boolean arrays
See
- numpy#19857 (comment)
- numpy#19154
seberg pushed a commit to bzah/numpy that referenced this issue Nov 1, 2021
- Added the missing linear interpolation methods.
- Updated the existing unit tests.

- Added pytest.mark.xfail for boolean arrays
See
- numpy#19857 (comment)
- numpy#19154
seberg pushed a commit to bzah/numpy that referenced this issue Nov 4, 2021
- Added the missing linear interpolation methods.
- Updated the existing unit tests.

- Added pytest.mark.xfail for boolean arrays
See
- numpy#19857 (comment)
- numpy#19154
@seberg
Copy link
Member

seberg commented Jun 10, 2022

This should have been fixed by the PR linked above. The test referencing this issue (with an XFAIL) is now passing. So I will close this (and try to remember to unmark that test, I was looking at some datetime related fixes in that corner.).

@seberg seberg closed this as completed Jun 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
06 - Regression triaged Issue/PR that was discussed in a triage meeting
Projects
None yet
Development

No branches or pull requests

2 participants