Skip to content

BUG ensure monotonic property of lerp in numpy.percentile #15098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from

Conversation

glemaitre
Copy link

closes #14685

Change the way percentiles are computed (and linearly interpolated) to ensure monotonic property.

@glemaitre
Copy link
Author

ping @seberg, @eric-wieser, @arthertz which might be interested to have a look at this.

@glemaitre
Copy link
Author

Is there other regression tests which would be meaningful?

@glemaitre
Copy link
Author

failures seem not associated with the PR changes

Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comments:

@mattip mattip requested a review from eric-wieser December 23, 2019 07:46
seberg
seberg previously approved these changes Dec 23, 2019
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. @eric-wieser want to squash merge it when you are happy? Since percentile should normally do most of its work in looking up the indices and not in the calculation, I do not think I would worry about speed.
I think we do not need release notes here.

@glemaitre
Copy link
Author

Tests are passing

@glemaitre
Copy link
Author

@eric-wieser do you require any further changes?

@seberg seberg self-requested a review January 16, 2020 23:18
@mattip
Copy link
Member

mattip commented Jan 23, 2020

@seberg ping. This is both approved and "pending review"

Comment on lines 3936 to 3937
x1 = np.moveaxis(x1, axis, 0)
x2 = np.moveaxis(x2, axis, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you kept these lines, and did them before the computation, then you'd be able to do:

r_above = np.add(x1, diff_x2_x1 * weights_above, out=out)
r_below = np.subtract(x2, diff_x2_x1 * weights_below, out=r_above, where=weights_above < 0.5)

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, always wanted to look at the formula mainly and the logic seems fine to me. The test failures are real, and the issue with out being a scalar also. I think reorganizing the calculation to make it closer to the original (move it to where the add was) should probably fix that issue.


# ensure axis with q-th is first
x1 = np.moveaxis(x1, axis, 0)
x2 = np.moveaxis(x2, axis, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this before the diff calcluation I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to call moveaxis on the weight too, I think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, do we even need this? We already moved the axis above...

I think perhaps the take should be axis=0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, axis is already 0 here, so this is a no-op

r_above = np.add(x1, diff_x2_x1 * weights_above, out=out)
r_below = np.subtract(
x2, diff_x2_x1 * weights_below, out=r_above,
where=weights_above < 0.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fun, but this way, please just call it r (or something longer) and not above and below, and add a comment that we first calculate everything for above, and then replace it with the "below" version.

Are there test for the combinations of "zerod" and out=... being given. Because this smells to me like we could get into trouble here.
Probably it is possible to move the if zerod block before these calculations to avoid that problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point is we made everything 1d at the top of the function, so the zerod block should remain below - to convert the final result back to 0d.

Perhaps we should push forward on the leavewrapped thing at some point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But out is never made 1-D, so we need to be careful?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a good example of for a need of leavewrapped.


if out is not None:
r = add(x1, x2, out=out)
out[...] = r_above
r = out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assignment to out should not be necessary here, the above operation is already in-place (i.e. similar to the old branch, which did not do this).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg is the double-assignment cleanup a blocker?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not on its own, but I think the above also needs some cleanups, maybe I should just put it on my todo list to do it myself.

@charris
Copy link
Member

charris commented Jan 29, 2020

Tests are failing.

@WarrenWeckesser
Copy link
Member

It would be nice to get this working.

@glemaitre, will you be able to resume working on this? There are a few issues still to be resolved.

@seberg
Copy link
Member

seberg commented Apr 10, 2020

@glemaitre do you have time/want to pick this up? Otherwise I am happy to drag it over the finish line.

@glemaitre
Copy link
Author

@seberg I am sorry I don't think I will get any time to finish it up in the next 3 weeks. If you think you can finalize in the meanwhile, do not hesitate.

@seberg seberg self-requested a review April 29, 2020 20:55
@seberg seberg dismissed their stale review April 29, 2020 20:56

needs some fixups and re-review.

@eric-wieser
Copy link
Member

Perhaps we ought to extract a lerp helper function here?

@seberg
Copy link
Member

seberg commented May 17, 2020

Yeah, could create a hidden ufunc, should be simple enough, maybe I will look at it beinning of the week.

@eric-wieser
Copy link
Member

For now I've got a local patch that extracts it to a pure python function, which builds upon #16274.

@eric-wieser
Copy link
Member

Alright, my cleanups are in - there's now a private _lerp function which this diff can be confined to.

@seberg
Copy link
Member

seberg commented Jun 27, 2020

Fixed in gh-16273 thanks @glemaitre, the credit of figuring out how to best fix this goes fully to you!

@seberg seberg closed this Jun 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: numpy.percentile output is not sorted
7 participants