Skip to content

ENH: Add weights option to numpy.quantile #16862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

iii-org-tw
Copy link

@iii-org-tw iii-org-tw commented Jul 14, 2020

  • w must have the same shape with a.shape or a[axis] when axis is int or None.

  • When w.shape == a.shape, it's the weights of each samples in a.

  • If w.shape != a.shape, it's the weights for each dimensions in a.

    For example,

          a = np.array([[1,2,3],[4,5,6]])
    
          w = np.array([10,11,12])
    
          axis = 1
    

    both weights of [1,2,3] and [4,5,6] are [10,11,12]

@eric-wieser
Copy link
Member

eric-wieser commented Jul 14, 2020

This was proposed before in #9211. The difficulty came in exactly defining how to apply the weights (gh-10736).

@iii-org-tw
Copy link
Author

iii-org-tw commented Jul 15, 2020

@eric-wieser
We have seen at least three types of formula to compute weighted quantiles.

We implement the third formula for the following reasons:

  1. The same formula is adapted by R wtd quantile. (https://www.rdocumentation.org/packages/reldist/versions/1.6-6/topics/wtd.quantile)
  2. This formula is well defined and can produce the same result as np.quantile when all value share the same weight.
  3. As current np.quantile has different interpolation methods, we see that this formula can satisfy people's need as long as we clarify how we calculate weighted quantile.

@charris charris changed the title Add weights option to numpy.quantile ENH: Add weights option to numpy.quantile Jul 15, 2020
@charris
Copy link
Member

charris commented Jul 15, 2020

You have picked up a bunch of extraneous merges from master (19 commits), might want to rebase on master.

@iii-org-tw iii-org-tw force-pushed the weighted_quantile branch 3 times, most recently from 9d46db7 to 3587c85 Compare July 20, 2020 20:09
@iii-org-tw
Copy link
Author

Hi @eric-wieser

What can we do to help getting this PR accepted?

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To move forward, I think we need to first come to a conclusion about gh-10736

@mattip mattip added the triage review Issue/PR to be discussed at the next triage meeting label Aug 2, 2020
Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss it at this week's triage meeting

@WarrenWeckesser WarrenWeckesser self-assigned this Aug 26, 2020
@seberg seberg added triaged Issue/PR that was discussed in a triage meeting and removed triage review Issue/PR to be discussed at the next triage meeting labels Aug 26, 2020
Base automatically changed from master to main March 4, 2021 02:05
@charris
Copy link
Member

charris commented Apr 12, 2021

close/reopen

@charris charris closed this Apr 12, 2021
@charris charris reopened this Apr 12, 2021
@charris
Copy link
Member

charris commented Apr 12, 2021

There are a lot of lint errors that need fixing. Hit details to see them. @mattip, do you recall the triage results?

@mattip
Copy link
Member

mattip commented Apr 13, 2021

No I don't remember and the meeting notes do not mention it. I think though that @WarrenWeckesser self-assigned this?

@mattip mattip added triage review Issue/PR to be discussed at the next triage meeting and removed triaged Issue/PR that was discussed in a triage meeting labels Apr 13, 2021
@WarrenWeckesser WarrenWeckesser removed their assignment Nov 10, 2021
@bzah bzah mentioned this pull request Jan 19, 2022
4 tasks
@seberg
Copy link
Member

seberg commented Jan 26, 2022

We discussed this in the Triage meeting a bit, and settled on "we want this". We are OK with the sorting approach and limiting e.g. only to aweights, that may need settling 100%, but generally using aweights (and maybe fweights) sems good.

@iii-org-tw
Copy link
Author

We discussed this in the Triage meeting a bit, and settled on "we want this". We are OK with the sorting approach and limiting e.g. only to aweights, that may need settling 100%, but generally using aweights (and maybe fweights) sems good.

So what should we do to make it merged?
Only to resolve the conflicts?

@seberg
Copy link
Member

seberg commented Feb 25, 2022

@iii-org-tw there are still a fewmoving parts here, rebasing is indeed the most important one. I also don't think the current API is quite what we need, so a proposal to pin it down further would be nice (should we name it aweights(?); maybe make it keyword only argument with , *, ...).
Of course we can pin it down/follow up later, a start would be helpful, though. Browsing through, it also seems odd that there is manual binary search code, I think. Can't we use np.searchsorted?

@xiaopi-ouo xiaopi-ouo force-pushed the weighted_quantile branch 2 times, most recently from 9078c53 to eb22b6d Compare March 20, 2022 19:35
@iii-org-tw
Copy link
Author

I run tests locally and got no error but after I push the code, the Refguide Check failed.
Here is the details.

2022-03-20T19:57:01.3135540Z ##[section]Starting: Run Refguide Check
2022-03-20T19:57:01.3145860Z ==============================================================================
2022-03-20T19:57:01.3146570Z Task         : Command line
2022-03-20T19:57:01.3147240Z Description  : Run a command line script using Bash on Linux and macOS and cmd.exe on Windows
2022-03-20T19:57:01.3147780Z Version      : 2.200.2
2022-03-20T19:57:01.3148240Z Author       : Microsoft Corporation
2022-03-20T19:57:01.3148760Z Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/command-line
2022-03-20T19:57:01.3149360Z ==============================================================================
2022-03-20T19:57:01.4919970Z Generating script.
2022-03-20T19:57:01.4933150Z Script contents:
2022-03-20T19:57:01.4934200Z python runtests.py -g --refguide-check
2022-03-20T19:57:01.4935230Z ========================== Starting Command Output ===========================
2022-03-20T19:57:01.4968100Z [command]/bin/bash --noprofile --norc /Users/runner/work/_temp/47852ec2-6c7f-4cc8-9f21-1c0342ab7f28.sh
2022-03-20T19:58:10.0310370Z numpy.core ............................................................................................................................................................................................................................................................................................................................................................Warning: overflow encountered in short_scalars
2022-03-20T19:58:16.1473440Z .........................
2022-03-20T19:58:16.1552920Z numpy.f2py .....
2022-03-20T19:58:16.8775710Z numpy.linalg ............................................
2022-03-20T19:58:18.0608180Z numpy.lib ......................................................................F...................................................................................................
2022-03-20T19:58:18.1139120Z numpy.lib.recfunctions .......................
2022-03-20T19:58:18.2818590Z numpy.fft ....................
2022-03-20T19:58:19.3505970Z numpy.ma .................................................................................................................................................................................................................................
2022-03-20T19:58:19.5106950Z numpy.polynomial .........
2022-03-20T19:58:19.5507870Z numpy.matrixlib ......
2022-03-20T19:58:58.1827610Z numpy.random .........................................................
2022-03-20T19:58:58.5037610Z numpy.testing ...........................................
2022-03-20T19:58:58.5041000Z Running checks for 11 modules:
2022-03-20T19:58:58.5042070Z 
2022-03-20T19:58:58.5042620Z =========
2022-03-20T19:58:58.5044680Z numpy.lib
2022-03-20T19:58:58.5045170Z =========
2022-03-20T19:58:58.5045450Z 
2022-03-20T19:58:58.5045850Z numpy.lib.quantile
2022-03-20T19:58:58.5047140Z ------------------
2022-03-20T19:58:58.5047440Z 
2022-03-20T19:58:58.5048440Z File "build/testenv/lib/python3.8/site-packages/numpy/__init__.py", line 375, in quantile
2022-03-20T19:58:58.5049070Z Failed example:
2022-03-20T19:58:58.5049560Z     np.quantile(a, 0.5, aweights=w, axis=0)
2022-03-20T19:58:58.5050020Z Exception raised:
2022-03-20T19:58:58.5050480Z     Traceback (most recent call last):
2022-03-20T19:58:58.5051080Z       File "/Users/runner/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/doctest.py", line 1336, in __run
2022-03-20T19:58:58.5051950Z         exec(compile(example.source, filename, "single",
2022-03-20T19:58:58.5052570Z       File "<doctest quantile[7]>", line 1, in <module>
2022-03-20T19:58:58.5053140Z         np.quantile(a, 0.5, aweights=w, axis=0)
2022-03-20T19:58:58.5053750Z       File "<__array_function__ internals>", line 180, in quantile
2022-03-20T19:58:58.5055040Z       File "/Users/runner/work/1/s/build/testenv/lib/python3.8/site-packages/numpy/lib/function_base.py", line 4479, in quantile
2022-03-20T19:58:58.5055820Z         return _quantile_unchecked(
2022-03-20T19:58:58.5057020Z       File "/Users/runner/work/1/s/build/testenv/lib/python3.8/site-packages/numpy/lib/function_base.py", line 4501, in _quantile_unchecked
2022-03-20T19:58:58.5059150Z         r, k = _weighted_ureduce(a,
2022-03-20T19:58:58.5060370Z       File "/Users/runner/work/1/s/build/testenv/lib/python3.8/site-packages/numpy/lib/function_base.py", line 4205, in _weighted_ureduce
2022-03-20T19:58:58.5061170Z         r = func(a, aweights, **kwargs)
2022-03-20T19:58:58.5062380Z       File "/Users/runner/work/1/s/build/testenv/lib/python3.8/site-packages/numpy/lib/function_base.py", line 4737, in _weighted_quantile_ureduce_func
2022-03-20T19:58:58.5063310Z         wp = np.take_along_axis(wp, sorted_index, axis=0)
2022-03-20T19:58:58.5063960Z       File "<__array_function__ internals>", line 180, in take_along_axis
2022-03-20T19:58:58.5065270Z       File "/Users/runner/work/1/s/build/testenv/lib/python3.8/site-packages/numpy/lib/shape_base.py", line 170, in take_along_axis
2022-03-20T19:58:58.5066230Z         return arr[_make_along_axis_idx(arr_shape, indices, axis)]
2022-03-20T19:58:58.5067480Z       File "/Users/runner/work/1/s/build/testenv/lib/python3.8/site-packages/numpy/lib/shape_base.py", line 34, in _make_along_axis_idx
2022-03-20T19:58:58.5068400Z         raise ValueError(
2022-03-20T19:58:58.5068980Z     ValueError: `indices` and `arr` must have the same number of dimensions
2022-03-20T19:58:58.5069380Z 
2022-03-20T19:58:58.5069620Z 
2022-03-20T19:58:58.5070080Z ERROR:  failed checking numpy.lib
2022-03-20T19:58:58.6500040Z ##[error]Bash exited with code '1'.
2022-03-20T19:58:58.6519370Z ##[section]Finishing: Run Refguide Check

I don't know what's wrong because I can get the right result when I run np.quantile(a, 0.5, aweights=w, axis=0) locally.

@seberg
Copy link
Member

seberg commented Mar 21, 2022

@iii-org-tw I don't immediately remember how to run the refguide check locally best. But, the example uses a = np.array([[10, 7, 4], [3, 2, 1]]) and you have w = np.ones(2). I get the same error locally, and I think it is probably correct.

For the weights to broadcast with a, they should be np.ones(3) or np.ones((2, 1))?

@iii-org-tw
Copy link
Author

@seberg
yup, thank you, I did not notice that w was re-assigned with another value. The docstring was wrong and have been modified.

Now, I have rebased the branch and solved conflicts. A problem is now we add aweights to nanquantile and nanpercentile JUST FOR PASSING THE TESTS (they should have the same signature with quantile and percentile). As nanquantile removes nan by _remove_nan_1d(arr1d), elements in aweights should also be removed if the corresponding values in arr1d will be removed. Adding aweights to _remove_nan_1d(arr1d) as a new paramter seems not a good idea since not all functions will use aweights. However, writing another function makes duplicated code. What do you think?

@seberg seberg mentioned this pull request Mar 28, 2022
@InessaPawson InessaPawson removed the triage review Issue/PR to be discussed at the next triage meeting label Apr 20, 2022
@seberg
Copy link
Member

seberg commented Jan 13, 2024

Closing, gh-24254 will probably happen, although I am sure it is not the same as this was proposing.

@seberg seberg closed this Jan 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

9 participants