Skip to content

BUG,DEP: Fixup quantile/percentile and rename interpolation->method #20327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 16, 2021

Conversation

seberg
Copy link
Member

@seberg seberg commented Nov 8, 2021

This (unfortunately currently) does 3 things:

  • It renames and deprecates the interpolation kwarg as method
  • It fixes the documentation current plot to use the new methods (see below)
  • It fixes two small issues with some discrete methods and tests them by expanding the "monoticity" test case

The main method docstring is modified to:

    method : str, optional
        This parameter specifies the method to use for estimating the
        percentile.  There are many different methods, some unique to NumPy.
        See the notes for explanation.  The options aligning with the R types
        and the H&F paper are:

        * (H&F 1): 'inverted_cdf'
        * (H&F 2): 'averaged_inverted_cdf'
        * (H&F 3): 'closest_observation'
        * (H&F 4): 'interpolated_inverted_cdf'
        * (H&F 5): 'hazen'
        * (H&F 6): 'weibull'
        * (H&F 7): 'linear'  (default)
        * (H&F 8): 'median_unbiased'
        * (H&F 9): 'normal_unbiased'

        Mainly for compatibility reasons, NumPy also supports the following
        options which appear to be unique to NumPy:

        * 'lower'
        * 'higher',
        * 'midpoint'
        * 'nearest'

        .. versionchanged:: 1.22.0
            This argument was previously called "interpolation" and only
            offered the "linear" default and last four options.

Here is the plot, note that it is smaller in the artifact: https://23201-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.percentile.html

new-percentile-plot
(yeah, colors/linestyles are a bit annoying, some of the colors match, because they are somewhat related, I think)

Closes gh-20306.

@@ -4445,19 +4492,22 @@ def _lerp(a, b, t, out=None):

def _get_gamma_mask(shape, default_value, conditioned_value, where):
out = np.full(shape, default_value)
out[where] = conditioned_value
np.copyto(out, conditioned_value, where=where, casting="unsafe")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally condition_value is just a scalar, so this doesn't matter. But in one of the branches, it is an array that matches the shape of out, so we have to use the copyto logic of copying both. (To be honest, the casting is just for sports right now.)

@seberg
Copy link
Member Author

seberg commented Nov 8, 2021

@bzah if you have some time, maybe you can also have a look?

@seberg seberg force-pushed the rename-interpolation branch 2 times, most recently from 15b731c to bf047da Compare November 8, 2021 23:23
Copy link
Member

@BvB93 BvB93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also rename the interpolation keyword in the stub file?
With the way things are currently set up you'll only have to update np.percentile.

@seberg seberg force-pushed the rename-interpolation branch from 61ab79b to c9f06ad Compare November 9, 2021 00:47
@seberg seberg added this to the 1.22.0 release milestone Nov 10, 2021
@bzah
Copy link
Contributor

bzah commented Nov 10, 2021

@seberg maybe it would be interesting to have a separate graph with only the discontinuous methods ?
Especially because someone (not sure of the ping) said he have some use of numpy's specific "lower" and "higher" methods.

For the record, the discontinuous methods are: "inverted_cdf", "averaged_inverted_cdf", "closest_observation", and "lower", "higher", "nearest" and "midpoint".
By keeping only the continuous methods from your graph, it would make it a bit easier to read as well.

The names are also quite confusing, without a graph one could potentially expect the same results from averaged_inverted_cdf and midpoint as well as from nearest and closest_observation.

@seberg
Copy link
Member Author

seberg commented Nov 10, 2021

I guess I could do two plots, one with all the discrete and one with the non-discrete. Although I wonder if it isn't also nice to mix. Right, closest_observation seems the most confusing one. Frankly, I wonder if it isn't worse than "midpoint", but I guess it probably has its purposes. Should we consider naming it differently?

Wolfram says: observation number closest to q n

(wikipedia is similar) The tricky part is the q n, which I am not sure if it is obvious?

@rossbar
Copy link
Contributor

rossbar commented Nov 10, 2021

I guess I could do two plots, one with all the discrete and one with the non-discrete.

I think there's merit in both approaches: having a single plot is nice because all of the information is collected in one place; however, as mentioned, the plot is quite busy. Two plots would have the advantage of being less busy, but the discontinuous distinction may not be particularly meaningful for the average user just trying to learn about the different methods.

FWIW I'd vote to add the plot as-is then evaluate alternatives in a follow-up PR to make comparison easier.

@xor2k
Copy link
Contributor

xor2k commented Nov 10, 2021

Just found some other projects where they use the numpy-based notation, if this is of interest: https://www.tensorflow.org/probability/api_docs/python/tfp/stats/percentile
https://docs.cupy.dev/en/stable/reference/generated/cupy.quantile.html (a GPU version of Numpy)
https://docs.rapids.ai/api/cudf/nightly/api_docs/api/cudf.DataFrame.quantile.html (GPU-Pandas by Nvidia)
Interestingly, PyTorch only supports linear interplation:
https://pytorch.org/docs/stable/generated/torch.quantile.html

@xor2k
Copy link
Contributor

xor2k commented Nov 10, 2021

In the Numpy Zoom session today I've mentioned that we use 'lower' and 'upper' interpolation. We typically do so to compensate a bias and ensure that we obtain a value which has actually been measured (especially a value which is integral if all measurements are integral). If there are no NaNs, this can also be achieved by sorting along one axis and picking the target value, e.g. array entry 3 if there are 4 entries if one wants interpolation='upper' and q=0.5 ("slightly above median", especially for larger arrays). It would also be possible with interpolation='linear' and picking q=(3-1)/(4-1) = 2/3, but this could yield some numerical instability, with 2/3 being a good example.

Probably changing our codebase to np.sort would be more appropriate (and not too much of an effort) anyway. I was maybe a little bit too worried my favorite (interpolation) methods would go away in the future ;)

seberg and others added 9 commits November 12, 2021 12:11
Also, the closest-observation did not correctly support multiple
quantiles calculated at the same time (broadcasting error).
Apparently, sphinx does not resolve references to footnotes from
parameter descriptions.
Co-authored-by: abel <aoun@cerfacs.fr>
Mainly fixes the method list slightly, tones down the warning a
bit and fixes the link to the paper (I did not realize that the
link failed to work due only because the reference was missing
from nanquantile/nanpercentile).
Also updates the old release note to include the info (and generally
tweaks it a bit)
@seberg seberg force-pushed the rename-interpolation branch from f3d8e28 to be15716 Compare November 12, 2021 18:12
@seberg
Copy link
Member Author

seberg commented Nov 12, 2021

OK, updated, hopefully pretty much final, but especially release note bike shedding very welcome :):

  1. tweaked the docstring, slightly less discouraging now (but still discouraging in the first paragraph which leads with "unique to NumPy"). Mention that some methods are discontinuous and all of the "bonus" ones are.
  2. Tweaked the current release note.
  3. Added a second deprecation release note.
  4. Readded the reference to the paper to link from it, and use a direct enumeration.

@seberg
Copy link
Member Author

seberg commented Nov 12, 2021

Btw. I won't worry about projects like tensorflow personally, they should be able to keep up. We can't keep potentially serious flaws in what methods we offer stand in the way.
As for pandas, I pinged them, and they seem to fine with the deprecation.

[np.percentile, np.quantile, np.nanpercentile, np.nanquantile])
def test_both_passed(self, func):
with warnings.catch_warnings():
# catch the warning, but make sure it does not raise:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"does not" <- "does"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning is given before the error is thrown. So we make sure the DeprecationWarning does not raise, yes. (In order to test the error that comes after the warning).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "catch the DeprecationWarning so that it does not raise".

@@ -3854,7 +3854,7 @@ def _median(a, axis=None, out=None, overwrite_input=False):


def _percentile_dispatcher(a, q, axis=None, out=None, overwrite_input=None,
interpolation=None, keepdims=None):
method=None, keepdims=None, *, interpolation=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. The change in order is probably safe. We should make the options keyword only at some point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NVM, this is private, so no problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the dispatch, so it is the same as the public signature. This is something __array_function__ implementers will have to catch up with.

I guess one annoyance is that it is hard to use avoid the warning if you are writing for both old and new NumPy versions. That would be an argument to delay the DeprecationWarning for a bit...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see how the first release goes.

method="linear",
keepdims=False,
*,
interpolation=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting approach, replace "interpolation" by "method" when used as positional arguments and only worry if "interpolation" is used as a keyword.

Co-authored-by: Charles Harris <charlesr.harris@gmail.com>
@seberg seberg force-pushed the rename-interpolation branch from 6130a87 to 546c47a Compare November 15, 2021 23:33
@charris charris merged commit 376ad69 into numpy:main Nov 16, 2021
@charris
Copy link
Member

charris commented Nov 16, 2021

Thanks Sebastian.

lithomas1 added a commit to lithomas1/numpy that referenced this pull request Nov 18, 2021
commit 30228578ac65ec6b90961700a5783c23d8e1b879
Author: Thomas Li <47963215+lithomas1@users.noreply.github.com>
Date:   Wed Nov 17 16:21:27 2021 -0800

    Add Windows config to GHA

    update script [wheel build]

    typo [wheel build]

    fix typo? [wheel build]

    fix linux builds? [wheel build]

    typo [wheel build]

    add license and pin to windows 2016

    skip tests [wheel build]

    pin to windows 2019 instead [wheel build]

    try to find out the error on windows [wheel build]

    maybe fix? [wheel build]

    maybe fix? [wheel build]

    fix? [wheel build]

    cleanup [wheel build]

    Add Windows config to GHA

    update script [wheel build]

    typo [wheel build]

    fix typo? [wheel build]

    fix linux builds? [wheel build]

    typo [wheel build]

    add license and pin to windows 2016

    skip tests [wheel build]

    pin to windows 2019 instead [wheel build]

    try to find out the error on windows [wheel build]

    maybe fix? [wheel build]

    maybe fix? [wheel build]

    fix? [wheel build]

    cleanup [wheel build]

    Add Windows config to GHA

    update script [wheel build]

    typo [wheel build]

    fix typo? [wheel build]

    fix linux builds? [wheel build]

    typo [wheel build]

    add license and pin to windows 2016

    skip tests [wheel build]

    pin to windows 2019 instead [wheel build]

    try to find out the error on windows [wheel build]

    maybe fix? [wheel build]

    maybe fix? [wheel build]

    fix? [wheel build]

    cleanup [wheel build]

    Update LICENSE_win32.txt

    Update LICENSE_win32.txt

    Add Windows config to GHA

    update script [wheel build]

    typo [wheel build]

    fix typo? [wheel build]

    fix linux builds? [wheel build]

    typo [wheel build]

    add license and pin to windows 2016

    skip tests [wheel build]

    pin to windows 2019 instead [wheel build]

    try to find out the error on windows [wheel build]

    maybe fix? [wheel build]

    maybe fix? [wheel build]

    fix? [wheel build]

    cleanup [wheel build]

    Update LICENSE_win32.txt

    Update LICENSE_win32.txt

    Add Windows config to GHA

    update script [wheel build]

    typo [wheel build]

    fix typo? [wheel build]

    fix linux builds? [wheel build]

    typo [wheel build]

    add license and pin to windows 2016

    skip tests [wheel build]

    pin to windows 2019 instead [wheel build]

    try to find out the error on windows [wheel build]

    maybe fix? [wheel build]

    maybe fix? [wheel build]

    fix? [wheel build]

    cleanup [wheel build]

    Add Windows config to GHA

    update script [wheel build]

    typo [wheel build]

    fix typo? [wheel build]

    fix linux builds? [wheel build]

    typo [wheel build]

    add license and pin to windows 2016

    skip tests [wheel build]

    pin to windows 2019 instead [wheel build]

    try to find out the error on windows [wheel build]

    maybe fix? [wheel build]

    maybe fix? [wheel build]

    fix? [wheel build]

    cleanup [wheel build]

    Update LICENSE_win32.txt

    Update LICENSE_win32.txt

    Add Windows config to GHA

    update script [wheel build]

    typo [wheel build]

    fix typo? [wheel build]

    fix linux builds? [wheel build]

    typo [wheel build]

    add license and pin to windows 2016

    skip tests [wheel build]

    pin to windows 2019 instead [wheel build]

    try to find out the error on windows [wheel build]

    maybe fix? [wheel build]

    maybe fix? [wheel build]

    fix? [wheel build]

    cleanup [wheel build]

    Update LICENSE_win32.txt

    Update LICENSE_win32.txt

commit 4bd12df
Author: Thomas Li <47963215+lithomas1@users.noreply.github.com>
Date:   Mon Nov 15 17:28:47 2021 -0800

    # This is a combination of 14 commits.
    # This is the 1st commit message:

    Add Windows config to GHA
    # This is the commit message numpy#2:

    update script [wheel build]
    # This is the commit message numpy#3:

    typo [wheel build]
    # This is the commit message numpy#4:

    fix typo? [wheel build]
    # This is the commit message numpy#5:

    fix linux builds? [wheel build]
    # This is the commit message numpy#6:

    typo [wheel build]
    # This is the commit message numpy#7:

    add license and pin to windows 2016
    # This is the commit message numpy#8:

    skip tests [wheel build]
    # This is the commit message numpy#9:

    pin to windows 2019 instead [wheel build]
    # This is the commit message numpy#10:

    try to find out the error on windows [wheel build]
    # This is the commit message numpy#11:

    maybe fix? [wheel build]
    # This is the commit message numpy#12:

    maybe fix? [wheel build]
    # This is the commit message numpy#13:

    fix? [wheel build]
    # This is the commit message numpy#14:

    cleanup [wheel build]

commit 444a721
Merge: 376ad69 22448b4
Author: Charles Harris <charlesr.harris@gmail.com>
Date:   Mon Nov 15 17:47:23 2021 -0700

    Merge pull request numpy#20274 from h-vetinari/fix_15179

    TST: Some fixes & refactoring around glibc-dependent skips in test_umath.py

commit 376ad69
Merge: b75fe57 546c47a
Author: Charles Harris <charlesr.harris@gmail.com>
Date:   Mon Nov 15 17:31:41 2021 -0700

    Merge pull request numpy#20327 from seberg/rename-interpolation

    BUG,DEP: Fixup quantile/percentile and rename interpolation->method

commit 546c47a
Author: Sebastian Berg <sebastian@sipsolutions.net>
Date:   Mon Nov 15 16:13:50 2021 -0600

    DOC: Fixups for interpolation rename comments from review

    Co-authored-by: Charles Harris <charlesr.harris@gmail.com>

commit b75fe57
Merge: 7310c09 cbc25d2
Author: Warren Weckesser <warren.weckesser@gmail.com>
Date:   Mon Nov 15 17:27:08 2021 -0500

    Merge pull request numpy#20369 from bilderbuchi/missing_diagnostics_newlines

    MAINT: Fix newlines in diagnostics output of numpy.f2py.

commit cbc25d2
Author: Christoph Buchner <bilderbuchi@phononoia.at>
Date:   Sun Nov 14 08:36:03 2021 +0100

    MAINT: Fix newlines in diagnostics output of numpy.f2py.

    Linebreaks were not consistently added to errmess/outmess
    arguments, which led to very long lines and wrong
    concatenation with compiler messages in f2py console output.

commit be15716
Author: Sebastian Berg <sebastian@sipsolutions.net>
Date:   Fri Nov 12 12:10:20 2021 -0600

    DOC: Add release not for quantile `interpolation` rename

    Also updates the old release note to include the info (and generally
    tweaks it a bit)

commit 7d8a8e7
Author: Sebastian Berg <sebastian@sipsolutions.net>
Date:   Fri Nov 12 11:57:22 2021 -0600

    DOC: Update percentile/quantile docs

    Mainly fixes the method list slightly, tones down the warning a
    bit and fixes the link to the paper (I did not realize that the
    link failed to work due only because the reference was missing
    from nanquantile/nanpercentile).

commit 5bd71fb
Author: Sebastian Berg <sebastian@sipsolutions.net>
Date:   Tue Nov 9 09:48:59 2021 -0600

    DOC: Add ticks to quantile interpolation/method error

    Co-authored-by: abel <aoun@cerfacs.fr>

commit 0d5fb81
Author: Sebastian Berg <sebastian@sipsolutions.net>
Date:   Mon Nov 8 20:39:50 2021 -0600

    DOC: Remove reference to paper from quantile `method` kwarg

    Apparently, sphinx does not resolve references to footnotes from
    parameter descriptions.

commit 8437663
Author: Sebastian Berg <sebastian@sipsolutions.net>
Date:   Mon Nov 8 18:25:37 2021 -0600

    MAINT: Rename interpolation to method in percentile stubs

commit 8cfb6b5
Author: Sebastian Berg <sebastian@sipsolutions.net>
Date:   Mon Nov 8 16:47:27 2021 -0600

    TST: Add deprecation testcase for quantile interpolation rename

commit a5ac5a5
Author: Sebastian Berg <sebastian@sipsolutions.net>
Date:   Mon Nov 8 16:41:24 2021 -0600

    DOC: Fixup the percentile methods plot

commit 85f3dda
Author: Sebastian Berg <sebastian@sipsolutions.net>
Date:   Mon Nov 8 16:37:41 2021 -0600

    BUG: quantile discrete methods ended up using -1 as index sometimes

    Also, the closest-observation did not correctly support multiple
    quantiles calculated at the same time (broadcasting error).

commit 3993408
Author: Sebastian Berg <sebastian@sipsolutions.net>
Date:   Mon Nov 8 15:38:30 2021 -0600

    API,DEP: Rename percentile/quantile `interpolation=` to `method=`

commit 22448b4
Author: H. Vetinari <h.vetinari@gmx.com>
Date:   Tue Nov 2 16:51:59 2021 +1100

    TST: parametrize glibc-version check in test_umath.py

commit 12923c2
Author: H. Vetinari <h.vetinari@gmx.com>
Date:   Tue Nov 2 14:21:01 2021 +1100

    TST: skip coverage of large elements in sincos_float32 for ancient glibc

    Fixes numpy#15179

commit 56268d5
Author: H. Vetinari <h.vetinari@gmx.com>
Date:   Tue Nov 2 14:19:24 2021 +1100

    TST: use existence of glibc-version to clean up a test

commit 01443e8
Author: H. Vetinari <h.vetinari@gmx.com>
Date:   Tue Nov 2 14:18:52 2021 +1100

    TST: turn glibc_older_than_2.17 into boolean instead of xfail

    this anticipates reuse of this boolean within the test module
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DEP: Rename quantile/percentile interpolation kwarg to method
6 participants