-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: add weighted quantile for inverted_cdf #24254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add weighted quantile for inverted_cdf #24254
Conversation
@seberg friendly ping. I know numpy 2.0 requires resources, but maybe you could have a quick look. The underlying issue is very old with a ton of discussions. (And I need help to fix the emscripten CI error.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am even here unsure whether this should be called weights
or something more clear? For inverted_cdf
I guess it doesn't matter (unless someone expects more than a single datapoint being mixed in the final step if its weight < 1
, but I am not sure that can be argued to even make sense).
numpy/lib/function_base.py
Outdated
# setup wgt to broadcast along axis | ||
wgt = np.broadcast_to(wgt, (a.ndim-1)*(1,) + wgt.shape) | ||
wgt = wgt.swapaxes(-1, axis) | ||
return wgt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting logic, but prior art in either case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why interesting? I just shifted these lines of code out of average
for reuse in quantile
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic tries to add convenience for an N-D input with with axis and 1-D (or other input) and I find that very dubious guessing of user intent with relatively little gain (it isn't hard to make sure weights and data are properly broadcast/aligned) and potential for confusion.
(The potential for terrible bugs where you expect swapping of weights to happen but it doesn't because shapes happen to be identical seems possible but unlikely.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, that code is not new or from me. I just took these lines out of np.average
for reuse in np.quantile
.
numpy/lib/function_base.py
Outdated
weights = np.moveaxis(weights, axis, destination=0) | ||
index_array = np.argsort(arr, axis=axis, kind="stable") | ||
# TODO: Which value to set to slices_having_nans. | ||
slices_having_nans = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just [-1]
after sorting.
@seberg Thank you very much for having a look! |
Do I have to add tests for n-dim data and that |
@seberg I put more effort into it to also test several |
@ogrisel uncovered some shortcomings that I'm trying to solve, in particular the I'm stuck because of the following error: import numpy as np
np.take(np.arange(10), np.array([1, 4]), out=np.zeros(2, dtype=float))
How can I write the content of some slice of an int64 array into a float64 array? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now ready for a - hopfully - final review. Test coverage is much extended and corner cases like nan handling are now fixed (and tested), no todo left. Edit: And I like the 100% 🟢 of the current CI. |
@seberg Do you think this could get in numpy 2.0? That would be my hope. |
Thanks, @lorentzenchr and @ogrisel for reviewing. The new changes look good, and the new tests were really needed. I did have a look at the shape logic, and it seems really quite clean to me, only comments I added were so small, that there would be no point in following up. Since voiced anything about the added API here or on the mailing list. Putting this in. |
Partially solves #8935.
This PR adds support for
weights
innp.quantile
similar tonp.average(..., weights=w)
. As an uncontroversial start, onlymethod="inverted_cdf"
is implemented for weights support.See #8935 (comment) for details.