Skip to content

BUG: quantile should error when weights are all zeros #28589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lucyleeow opened this issue Mar 26, 2025 · 12 comments · May be fixed by #28594 or #28597
Open

BUG: quantile should error when weights are all zeros #28589

lucyleeow opened this issue Mar 26, 2025 · 12 comments · May be fixed by #28594 or #28597
Labels

Comments

@lucyleeow
Copy link

Describe the issue:

np.quantile with weights being all zeros should probably give a error/warning. It's effectively like asking for a quantile of an empty array. Currently numpy seems to return the first sample.

Reproduce the code example:

np.quantile([1,2,3,4], 0.5, weights=[0,0,0,0], method='inverted_cdf')

Error message:

/.../numpy/lib/_function_base_impl.py:4858: RuntimeWarning: invalid value encountered in divide
  cdf /= cdf[-1, ...]  # normalization to 1
Out[9]: array(1)

Python and NumPy Versions:

2.1.3
3.13.0 | packaged by conda-forge | (main, Nov 27 2024, 19:18:50) [GCC 13.3.0]

Runtime Environment:

No response

Context for the issue:

Context: Noticed when looking into scikit-learn/scikit-learn#31032

@seberg
Copy link
Member

seberg commented Mar 26, 2025

Thanks for reporting! Interesting that it ends up with something after giving an invalid warning. CC @lorentzenchr, but I suspect this shouldn't be hard to fix by anyone, probably really needs an explicit check.

(Ah, I suppose the weights end up NaN, but the rest still works -- somewhat...)

@techyorzl
Copy link

Would love to work on this issue!

@seberg
Copy link
Member

seberg commented Mar 26, 2025

We don't assign issues, please just work on a PR with tests.

@lorentzenchr
Copy link
Contributor

Maybe, this should be aligned with np.average (with weighted quantiles, I followed np.average).

@Tontonio3
Copy link

I also found this weird behavior in np.quantile, if you use np.nan in any weight it outputs 1 but it doesn't raise any errors.

import numpy as np

wgt_lst = [1]*100

x = np.linspace(1,100,100)

q = np.zeros(100)

for i in range(100):

    wgt_lst[i] = np.nan
    q[i] = np.quantile(x, 0.5, weights=wgt_lst, method='inverted_cdf')
    wgt_lst[i] = 1

print(np.all(q == 1.))

q = np.zeros(100)

for i in range(100):

    wgt_lst[i] = np.nan
    q[i] = np.quantile(x, 0.5, weights=wgt_lst, method='inverted_cdf')

print(np.all(q == 1.))

it outputs:

True
True

Also, np.inf causes weird behaviours.

import numpy as np

a = np.quantile([1,2,3,4], 0.5, weights=[1,1,1,np.inf], method='inverted_cdf')
b = np.quantile([1,2,3,4], 0.5, weights=[np.inf,1,1,1], method='inverted_cdf')
print(a)
print(b)

outputs:

\...\numpy\lib\_function_base_impl.py:4891: RuntimeWarning: invalid value encountered in divide
  cdf /= cdf[-1, ...]  # normalization to 1
4
1

rottenstea added a commit to rottenstea/numpy that referenced this issue Mar 27, 2025
…eights are all zeros numpy#28589

Previously, np.quantile with all weights set to zero returned the first sample. This is fixed now via adding an explicit check that raises a
ValueError when all weights are zero, preventing division by zero in the CDF.

- Added the weight check in `_quantile` (line 4874 ff) to detect all-zero weights and raise an error.
- Added a test in `test_function_base.py` to verify this behavior.
- Added a second test to ensure existing quantile functionality remains unchanged.

This should close numpy#28589
@galio1237 galio1237 linked a pull request Mar 27, 2025 that will close this issue
@lucyleeow
Copy link
Author

Linking related issue: #21091

Not sure what the right behaviour is when a weight is nan, potentially ignore that sample like with numpy.nanquantile . If that is the case, maybe only numpy.nanquantile should handle nan weights?

@ngoldbaum
Copy link
Member

ngoldbaum commented Mar 28, 2025

We now have three open pull requests to close this issue (#28594, #28595, and #28597), all coming in within a day or so. We should probably close two of them. I'd appreciate it if the contributors who opened pull requests for this could figure out which one we should review.

Also, before opening a pull request, please double-check that there isn't an open PR already.

@seberg
Copy link
Member

seberg commented Mar 28, 2025

I don't think we need to handle nan weights. For all zero weights, it should be identical to quantile over no points quantile([], ...) without weights (I think an error).
I have not yet looked at the open PRs.

@seberg
Copy link
Member

seberg commented Apr 1, 2025

CC @rottenstea , @Tontonio3, and @galio1237 can you have a look at #28589 (comment)?

I think all of these PRs need some work/thought still. I don't want to review all 3 and some got helpful comments already, so I'll make some broad comments here:

  • Before we add this type of check to 2 out of 4 code paths, I would like to see a refactor and introduce a new function (I doubt negative weights in average make sense, in which case we don't need a new function, just a rename). @lorentzenchr had hinted to that, since he mentioned this code is based on patterns also used elsewhere.
  • I don't like dtype=object special cases unless there are very important. Just don't add them and rather explain why you are tempted to, so we can decide if it's worthwhile.
  • As Marten comment, find the minimal checks. Personally, I wouldn't care about a distinct error messages, but that is a matter of opinion.
  • The way comparisons are done matters if you are worried about NaNs. E.g. all(value > 0) is very different for not any(value <= 0).
  • Some checks may make more sense after summing (before the normalization). But we should also see if they are even needed (if they are only about improving the error message, we may not need them).

(If I would trust compilers/systems, all we actually would need to do is add a with np.errstate(invalid="error"): to an astype(intp) somewhere. @mattip wanted to look at that recently, but for now we don't have it.)

@lorentzenchr
Copy link
Contributor

I doubt negative weights in average make sense

I have to raise concerns against this statement. Please do not restict weights in any way for np.average. There are myriads of applications of taking averages, some of them require negative weights for very good reasons.

For quantiles, it can be argued, things are different because an empirical CDF is calculated based on the weights which are therefore required to be non-negative and to sum to a positive number.

@seberg
Copy link
Member

seberg commented Apr 1, 2025

some of them require negative weights for very good reasons

Cool, thanks for pointing that out, sorry, I think there were more functions then just percentile/quantile that check < 0, but I think I just mistook them for others...

@Tontonio3
Copy link

Tontonio3 commented Apr 2, 2025

I don't like dtype=object special cases unless there are very important. Just don't add them and rather explain why you are tempted to, so we can decide if it's worthwhile.

There is a test in .\numpy\lib\tests\test_function_base.py that has a dtype=object, which causes n.isnan and np.isinf to give an error.

That's why I opened issue #28602

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants