BUG,DEP: Fixup quantile/percentile and rename interpolation->method #20327

seberg · 2021-11-08T23:01:32Z

This (unfortunately currently) does 3 things:

It renames and deprecates the interpolation kwarg as method
It fixes the documentation current plot to use the new methods (see below)
It fixes two small issues with some discrete methods and tests them by expanding the "monoticity" test case

The main method docstring is modified to:

    method : str, optional
        This parameter specifies the method to use for estimating the
        percentile.  There are many different methods, some unique to NumPy.
        See the notes for explanation.  The options aligning with the R types
        and the H&F paper are:

        * (H&F 1): 'inverted_cdf'
        * (H&F 2): 'averaged_inverted_cdf'
        * (H&F 3): 'closest_observation'
        * (H&F 4): 'interpolated_inverted_cdf'
        * (H&F 5): 'hazen'
        * (H&F 6): 'weibull'
        * (H&F 7): 'linear'  (default)
        * (H&F 8): 'median_unbiased'
        * (H&F 9): 'normal_unbiased'

        Mainly for compatibility reasons, NumPy also supports the following
        options which appear to be unique to NumPy:

        * 'lower'
        * 'higher',
        * 'midpoint'
        * 'nearest'

        .. versionchanged:: 1.22.0
            This argument was previously called "interpolation" and only
            offered the "linear" default and last four options.

Here is the plot, note that it is smaller in the artifact: https://23201-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.percentile.html

(yeah, colors/linestyles are a bit annoying, some of the colors match, because they are somewhat related, I think)

Closes gh-20306.

seberg · 2021-11-08T23:02:42Z

numpy/lib/function_base.py

@@ -4445,19 +4492,22 @@ def _lerp(a, b, t, out=None):

 def _get_gamma_mask(shape, default_value, conditioned_value, where):
    out = np.full(shape, default_value)
-    out[where] = conditioned_value
+    np.copyto(out, conditioned_value, where=where, casting="unsafe")


Normally condition_value is just a scalar, so this doesn't matter. But in one of the branches, it is an array that matches the shape of out, so we have to use the copyto logic of copying both. (To be honest, the casting is just for sports right now.)

seberg · 2021-11-08T23:04:02Z

@bzah if you have some time, maybe you can also have a look?

BvB93

Can you also rename the interpolation keyword in the stub file?
With the way things are currently set up you'll only have to update np.percentile.

numpy/lib/function_base.py

bzah · 2021-11-10T17:36:35Z

@seberg maybe it would be interesting to have a separate graph with only the discontinuous methods ?
Especially because someone (not sure of the ping) said he have some use of numpy's specific "lower" and "higher" methods.

For the record, the discontinuous methods are: "inverted_cdf", "averaged_inverted_cdf", "closest_observation", and "lower", "higher", "nearest" and "midpoint".
By keeping only the continuous methods from your graph, it would make it a bit easier to read as well.

The names are also quite confusing, without a graph one could potentially expect the same results from averaged_inverted_cdf and midpoint as well as from nearest and closest_observation.

seberg · 2021-11-10T17:57:00Z

I guess I could do two plots, one with all the discrete and one with the non-discrete. Although I wonder if it isn't also nice to mix. Right, closest_observation seems the most confusing one. Frankly, I wonder if it isn't worse than "midpoint", but I guess it probably has its purposes. Should we consider naming it differently?

Wolfram says: observation number closest to q n

(wikipedia is similar) The tricky part is the q n, which I am not sure if it is obvious?

rossbar · 2021-11-10T18:12:28Z

I guess I could do two plots, one with all the discrete and one with the non-discrete.

I think there's merit in both approaches: having a single plot is nice because all of the information is collected in one place; however, as mentioned, the plot is quite busy. Two plots would have the advantage of being less busy, but the discontinuous distinction may not be particularly meaningful for the average user just trying to learn about the different methods.

FWIW I'd vote to add the plot as-is then evaluate alternatives in a follow-up PR to make comparison easier.

xor2k · 2021-11-10T21:30:34Z

Just found some other projects where they use the numpy-based notation, if this is of interest: https://www.tensorflow.org/probability/api_docs/python/tfp/stats/percentile
https://docs.cupy.dev/en/stable/reference/generated/cupy.quantile.html (a GPU version of Numpy)
https://docs.rapids.ai/api/cudf/nightly/api_docs/api/cudf.DataFrame.quantile.html (GPU-Pandas by Nvidia)
Interestingly, PyTorch only supports linear interplation:
https://pytorch.org/docs/stable/generated/torch.quantile.html

xor2k · 2021-11-10T22:45:14Z

In the Numpy Zoom session today I've mentioned that we use 'lower' and 'upper' interpolation. We typically do so to compensate a bias and ensure that we obtain a value which has actually been measured (especially a value which is integral if all measurements are integral). If there are no NaNs, this can also be achieved by sorting along one axis and picking the target value, e.g. array entry 3 if there are 4 entries if one wants interpolation='upper' and q=0.5 ("slightly above median", especially for larger arrays). It would also be possible with interpolation='linear' and picking q=(3-1)/(4-1) = 2/3, but this could yield some numerical instability, with 2/3 being a good example.

Probably changing our codebase to np.sort would be more appropriate (and not too much of an effort) anyway. I was maybe a little bit too worried my favorite (interpolation) methods would go away in the future ;)

Also, the closest-observation did not correctly support multiple quantiles calculated at the same time (broadcasting error).

Apparently, sphinx does not resolve references to footnotes from parameter descriptions.

Co-authored-by: abel <aoun@cerfacs.fr>

Mainly fixes the method list slightly, tones down the warning a bit and fixes the link to the paper (I did not realize that the link failed to work due only because the reference was missing from nanquantile/nanpercentile).

Also updates the old release note to include the info (and generally tweaks it a bit)

seberg · 2021-11-12T18:14:06Z

OK, updated, hopefully pretty much final, but especially release note bike shedding very welcome :):

tweaked the docstring, slightly less discouraging now (but still discouraging in the first paragraph which leads with "unique to NumPy"). Mention that some methods are discontinuous and all of the "bonus" ones are.
Tweaked the current release note.
Added a second deprecation release note.
Readded the reference to the paper to link from it, and use a direct enumeration.

seberg · 2021-11-12T18:24:41Z

Btw. I won't worry about projects like tensorflow personally, they should be able to keep up. We can't keep potentially serious flaws in what methods we offer stand in the way.
As for pandas, I pinged them, and they seem to fine with the deprecation.

charris · 2021-11-15T22:04:47Z

numpy/core/tests/test_deprecations.py

+            [np.percentile, np.quantile, np.nanpercentile, np.nanquantile])
+    def test_both_passed(self, func):
+        with warnings.catch_warnings():
+            # catch the warning, but make sure it does not raise:


"does not" <- "does"?

The warning is given before the error is thrown. So we make sure the DeprecationWarning does not raise, yes. (In order to test the error that comes after the warning).

Maybe "catch the DeprecationWarning so that it does not raise".

numpy/lib/function_base.py

charris · 2021-11-15T22:08:18Z

numpy/lib/function_base.py

@@ -3854,7 +3854,7 @@ def _median(a, axis=None, out=None, overwrite_input=False):


 def _percentile_dispatcher(a, q, axis=None, out=None, overwrite_input=None,
-                           interpolation=None, keepdims=None):
+                           method=None, keepdims=None, *, interpolation=None):


Hmm. The change in order is probably safe. We should make the options keyword only at some point.

NVM, this is private, so no problem.

This is the dispatch, so it is the same as the public signature. This is something __array_function__ implementers will have to catch up with.

I guess one annoyance is that it is hard to use avoid the warning if you are writing for both old and new NumPy versions. That would be an argument to delay the DeprecationWarning for a bit...

Let's see how the first release goes.

charris · 2021-11-15T22:13:19Z

numpy/lib/function_base.py

+               method="linear",
+               keepdims=False,
+               *,
+               interpolation=None):


Interesting approach, replace "interpolation" by "method" when used as positional arguments and only worry if "interpolation" is used as a keyword.

Co-authored-by: Charles Harris <charlesr.harris@gmail.com>

charris · 2021-11-16T00:31:54Z

Thanks Sebastian.

commit 30228578ac65ec6b90961700a5783c23d8e1b879 Author: Thomas Li <47963215+lithomas1@users.noreply.github.com> Date: Wed Nov 17 16:21:27 2021 -0800 Add Windows config to GHA update script [wheel build] typo [wheel build] fix typo? [wheel build] fix linux builds? [wheel build] typo [wheel build] add license and pin to windows 2016 skip tests [wheel build] pin to windows 2019 instead [wheel build] try to find out the error on windows [wheel build] maybe fix? [wheel build] maybe fix? [wheel build] fix? [wheel build] cleanup [wheel build] Add Windows config to GHA update script [wheel build] typo [wheel build] fix typo? [wheel build] fix linux builds? [wheel build] typo [wheel build] add license and pin to windows 2016 skip tests [wheel build] pin to windows 2019 instead [wheel build] try to find out the error on windows [wheel build] maybe fix? [wheel build] maybe fix? [wheel build] fix? [wheel build] cleanup [wheel build] Add Windows config to GHA update script [wheel build] typo [wheel build] fix typo? [wheel build] fix linux builds? [wheel build] typo [wheel build] add license and pin to windows 2016 skip tests [wheel build] pin to windows 2019 instead [wheel build] try to find out the error on windows [wheel build] maybe fix? [wheel build] maybe fix? [wheel build] fix? [wheel build] cleanup [wheel build] Update LICENSE_win32.txt Update LICENSE_win32.txt Add Windows config to GHA update script [wheel build] typo [wheel build] fix typo? [wheel build] fix linux builds? [wheel build] typo [wheel build] add license and pin to windows 2016 skip tests [wheel build] pin to windows 2019 instead [wheel build] try to find out the error on windows [wheel build] maybe fix? [wheel build] maybe fix? [wheel build] fix? [wheel build] cleanup [wheel build] Update LICENSE_win32.txt Update LICENSE_win32.txt Add Windows config to GHA update script [wheel build] typo [wheel build] fix typo? [wheel build] fix linux builds? [wheel build] typo [wheel build] add license and pin to windows 2016 skip tests [wheel build] pin to windows 2019 instead [wheel build] try to find out the error on windows [wheel build] maybe fix? [wheel build] maybe fix? [wheel build] fix? [wheel build] cleanup [wheel build] Add Windows config to GHA update script [wheel build] typo [wheel build] fix typo? [wheel build] fix linux builds? [wheel build] typo [wheel build] add license and pin to windows 2016 skip tests [wheel build] pin to windows 2019 instead [wheel build] try to find out the error on windows [wheel build] maybe fix? [wheel build] maybe fix? [wheel build] fix? [wheel build] cleanup [wheel build] Update LICENSE_win32.txt Update LICENSE_win32.txt Add Windows config to GHA update script [wheel build] typo [wheel build] fix typo? [wheel build] fix linux builds? [wheel build] typo [wheel build] add license and pin to windows 2016 skip tests [wheel build] pin to windows 2019 instead [wheel build] try to find out the error on windows [wheel build] maybe fix? [wheel build] maybe fix? [wheel build] fix? [wheel build] cleanup [wheel build] Update LICENSE_win32.txt Update LICENSE_win32.txt commit 4bd12df Author: Thomas Li <47963215+lithomas1@users.noreply.github.com> Date: Mon Nov 15 17:28:47 2021 -0800 # This is a combination of 14 commits. # This is the 1st commit message: Add Windows config to GHA # This is the commit message numpy#2: update script [wheel build] # This is the commit message numpy#3: typo [wheel build] # This is the commit message numpy#4: fix typo? [wheel build] # This is the commit message numpy#5: fix linux builds? [wheel build] # This is the commit message numpy#6: typo [wheel build] # This is the commit message numpy#7: add license and pin to windows 2016 # This is the commit message numpy#8: skip tests [wheel build] # This is the commit message numpy#9: pin to windows 2019 instead [wheel build] # This is the commit message numpy#10: try to find out the error on windows [wheel build] # This is the commit message numpy#11: maybe fix? [wheel build] # This is the commit message numpy#12: maybe fix? [wheel build] # This is the commit message numpy#13: fix? [wheel build] # This is the commit message numpy#14: cleanup [wheel build] commit 444a721 Merge: 376ad69 22448b4 Author: Charles Harris <charlesr.harris@gmail.com> Date: Mon Nov 15 17:47:23 2021 -0700 Merge pull request numpy#20274 from h-vetinari/fix_15179 TST: Some fixes & refactoring around glibc-dependent skips in test_umath.py commit 376ad69 Merge: b75fe57 546c47a Author: Charles Harris <charlesr.harris@gmail.com> Date: Mon Nov 15 17:31:41 2021 -0700 Merge pull request numpy#20327 from seberg/rename-interpolation BUG,DEP: Fixup quantile/percentile and rename interpolation->method commit 546c47a Author: Sebastian Berg <sebastian@sipsolutions.net> Date: Mon Nov 15 16:13:50 2021 -0600 DOC: Fixups for interpolation rename comments from review Co-authored-by: Charles Harris <charlesr.harris@gmail.com> commit b75fe57 Merge: 7310c09 cbc25d2 Author: Warren Weckesser <warren.weckesser@gmail.com> Date: Mon Nov 15 17:27:08 2021 -0500 Merge pull request numpy#20369 from bilderbuchi/missing_diagnostics_newlines MAINT: Fix newlines in diagnostics output of numpy.f2py. commit cbc25d2 Author: Christoph Buchner <bilderbuchi@phononoia.at> Date: Sun Nov 14 08:36:03 2021 +0100 MAINT: Fix newlines in diagnostics output of numpy.f2py. Linebreaks were not consistently added to errmess/outmess arguments, which led to very long lines and wrong concatenation with compiler messages in f2py console output. commit be15716 Author: Sebastian Berg <sebastian@sipsolutions.net> Date: Fri Nov 12 12:10:20 2021 -0600 DOC: Add release not for quantile `interpolation` rename Also updates the old release note to include the info (and generally tweaks it a bit) commit 7d8a8e7 Author: Sebastian Berg <sebastian@sipsolutions.net> Date: Fri Nov 12 11:57:22 2021 -0600 DOC: Update percentile/quantile docs Mainly fixes the method list slightly, tones down the warning a bit and fixes the link to the paper (I did not realize that the link failed to work due only because the reference was missing from nanquantile/nanpercentile). commit 5bd71fb Author: Sebastian Berg <sebastian@sipsolutions.net> Date: Tue Nov 9 09:48:59 2021 -0600 DOC: Add ticks to quantile interpolation/method error Co-authored-by: abel <aoun@cerfacs.fr> commit 0d5fb81 Author: Sebastian Berg <sebastian@sipsolutions.net> Date: Mon Nov 8 20:39:50 2021 -0600 DOC: Remove reference to paper from quantile `method` kwarg Apparently, sphinx does not resolve references to footnotes from parameter descriptions. commit 8437663 Author: Sebastian Berg <sebastian@sipsolutions.net> Date: Mon Nov 8 18:25:37 2021 -0600 MAINT: Rename interpolation to method in percentile stubs commit 8cfb6b5 Author: Sebastian Berg <sebastian@sipsolutions.net> Date: Mon Nov 8 16:47:27 2021 -0600 TST: Add deprecation testcase for quantile interpolation rename commit a5ac5a5 Author: Sebastian Berg <sebastian@sipsolutions.net> Date: Mon Nov 8 16:41:24 2021 -0600 DOC: Fixup the percentile methods plot commit 85f3dda Author: Sebastian Berg <sebastian@sipsolutions.net> Date: Mon Nov 8 16:37:41 2021 -0600 BUG: quantile discrete methods ended up using -1 as index sometimes Also, the closest-observation did not correctly support multiple quantiles calculated at the same time (broadcasting error). commit 3993408 Author: Sebastian Berg <sebastian@sipsolutions.net> Date: Mon Nov 8 15:38:30 2021 -0600 API,DEP: Rename percentile/quantile `interpolation=` to `method=` commit 22448b4 Author: H. Vetinari <h.vetinari@gmx.com> Date: Tue Nov 2 16:51:59 2021 +1100 TST: parametrize glibc-version check in test_umath.py commit 12923c2 Author: H. Vetinari <h.vetinari@gmx.com> Date: Tue Nov 2 14:21:01 2021 +1100 TST: skip coverage of large elements in sincos_float32 for ancient glibc Fixes numpy#15179 commit 56268d5 Author: H. Vetinari <h.vetinari@gmx.com> Date: Tue Nov 2 14:19:24 2021 +1100 TST: use existence of glibc-version to clean up a test commit 01443e8 Author: H. Vetinari <h.vetinari@gmx.com> Date: Tue Nov 2 14:18:52 2021 +1100 TST: turn glibc_older_than_2.17 into boolean instead of xfail this anticipates reuse of this boolean within the test module

github-actions bot added the 00 - Bug label Nov 8, 2021

seberg commented Nov 8, 2021

View reviewed changes

seberg force-pushed the rename-interpolation branch 2 times, most recently from 15b731c to bf047da Compare November 8, 2021 23:23

BvB93 reviewed Nov 9, 2021

View reviewed changes

seberg force-pushed the rename-interpolation branch from 61ab79b to c9f06ad Compare November 9, 2021 00:47

bzah approved these changes Nov 9, 2021

View reviewed changes

numpy/lib/function_base.py Outdated Show resolved Hide resolved

seberg added this to the 1.22.0 release milestone Nov 10, 2021

seberg and others added 9 commits November 12, 2021 12:11

API,DEP: Rename percentile/quantile interpolation= to method=

3993408

BUG: quantile discrete methods ended up using -1 as index sometimes

85f3dda

Also, the closest-observation did not correctly support multiple quantiles calculated at the same time (broadcasting error).

DOC: Fixup the percentile methods plot

a5ac5a5

TST: Add deprecation testcase for quantile interpolation rename

8cfb6b5

MAINT: Rename interpolation to method in percentile stubs

8437663

DOC: Remove reference to paper from quantile method kwarg

0d5fb81

Apparently, sphinx does not resolve references to footnotes from parameter descriptions.

DOC: Add ticks to quantile interpolation/method error

5bd71fb

Co-authored-by: abel <aoun@cerfacs.fr>

DOC: Update percentile/quantile docs

7d8a8e7

Mainly fixes the method list slightly, tones down the warning a bit and fixes the link to the paper (I did not realize that the link failed to work due only because the reference was missing from nanquantile/nanpercentile).

DOC: Add release not for quantile interpolation rename

be15716

Also updates the old release note to include the info (and generally tweaks it a bit)

seberg force-pushed the rename-interpolation branch from f3d8e28 to be15716 Compare November 12, 2021 18:12

charris reviewed Nov 15, 2021

View reviewed changes

numpy/lib/function_base.py Outdated Show resolved Hide resolved

charris reviewed Nov 15, 2021

View reviewed changes

DOC: Fixups for interpolation rename comments from review

546c47a

Co-authored-by: Charles Harris <charlesr.harris@gmail.com>

seberg force-pushed the rename-interpolation branch from 6130a87 to 546c47a Compare November 15, 2021 23:33

charris merged commit 376ad69 into numpy:main Nov 16, 2021

mathause mentioned this pull request Dec 25, 2021

quantile: rename interpolation arg to method pydata/xarray#6108

Merged

3 tasks

Illviljan mentioned this pull request Dec 27, 2021

dask.array.percentile: rename interpolation arg to method dask/dask#8518

Closed

neutrinoceros mentioned this pull request Feb 12, 2024

BUG: fix signature incompatibilities between helper and helped functions (astropy.utils.masked) astropy/astropy#16025

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG,DEP: Fixup quantile/percentile and rename interpolation->method #20327

BUG,DEP: Fixup quantile/percentile and rename interpolation->method #20327

seberg commented Nov 8, 2021 •

edited

Loading

seberg Nov 8, 2021

seberg commented Nov 8, 2021

BvB93 left a comment

bzah commented Nov 10, 2021

seberg commented Nov 10, 2021

rossbar commented Nov 10, 2021

xor2k commented Nov 10, 2021 •

edited

Loading

xor2k commented Nov 10, 2021

seberg commented Nov 12, 2021

seberg commented Nov 12, 2021

charris Nov 15, 2021

seberg Nov 15, 2021

charris Nov 15, 2021

charris Nov 15, 2021

charris Nov 15, 2021

seberg Nov 15, 2021

charris Nov 16, 2021

charris Nov 15, 2021

charris commented Nov 16, 2021

BUG,DEP: Fixup quantile/percentile and rename interpolation->method #20327

BUG,DEP: Fixup quantile/percentile and rename interpolation->method #20327

Conversation

seberg commented Nov 8, 2021 • edited Loading

Choose a reason for hiding this comment

seberg commented Nov 8, 2021

BvB93 left a comment

Choose a reason for hiding this comment

bzah commented Nov 10, 2021

seberg commented Nov 10, 2021

rossbar commented Nov 10, 2021

xor2k commented Nov 10, 2021 • edited Loading

xor2k commented Nov 10, 2021

seberg commented Nov 12, 2021

seberg commented Nov 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charris commented Nov 16, 2021

seberg commented Nov 8, 2021 •

edited

Loading

xor2k commented Nov 10, 2021 •

edited

Loading