Weighted quantile option in nanpercentile()

For our work we frequently need to compute weighted quantiles.  This is especially important when we need to weigh data from recent years more heavily in making predictions.

I've put together a function (called `weighted_quantile`) largely based on the source code of `np.percentile`.  It allows one to input weights along a single dimension, as a dict `w_dict`.  Below are some manual tests, using `xarray` as starting point:

When all weights = 1, it's identical to using `np.percentile`:
```python
>>> ar0
<xarray.DataArray (x: 3, y: 4)>
array([[3, 4, 8, 1],
       [5, 3, 7, 9],
       [4, 9, 6, 2]])
Coordinates:
  * x        (x) |S1 'a' 'b' 'c'
  * y        (y) int64 0 1 2 3
>>> np.percentile(ar0, q=[25, 50, 75], axis=-1)
array([[ 2.5 ,  4.5 ,  3.5 ],
       [ 3.5 ,  6.  ,  5.  ],
       [ 5.  ,  7.5 ,  6.75]])
>>> weighted_quantile(da=ar0, q=[0.25, 0.5, 0.75], dim='y', w_dict={'y': [1,1,1,1]})
<xarray.DataArray (quantile: 3, x: 3)>
array([[ 2.5 ,  4.5 ,  3.5 ],
       [ 3.5 ,  6.  ,  5.  ],
       [ 5.  ,  7.5 ,  6.75]])
Coordinates:
  * x         (x) |S1 'a' 'b' 'c'
  * quantile  (quantile) float64 0.25 0.5 0.75
```

Now different weights:
```python
>>> weighted_quantile(da=ar0, q=[0.25, 0.5, 0.75], dim='y', w_dict={'y': [1,2,3,4.0]})
<xarray.DataArray (quantile: 3, x: 3)>
array([[ 3.25    ,  5.666667,  4.333333],
       [ 4.      ,  7.      ,  5.333333],
       [ 6.      ,  8.      ,  6.75    ]])
Coordinates:
  * x         (x) |S1 'a' 'b' 'c'
  * quantile  (quantile) float64 0.25 0.5 0.75
```

Also handles nan values like `np.nanpercentile`:
```python
>>> ar
<xarray.DataArray (x: 2, y: 2, z: 2)>
array([[[ nan,   3.],
        [ nan,   5.]],

       [[  8.,   1.],
        [ nan,   0.]]])
Coordinates:
  * x        (x) |S1 'a' 'b'
  * y        (y) int64 0 1
  * z        (z) int64 8 9
>>> da_stacked = ar.stack(mi=['x', 'y'])
>>> out = weighted_quantile(da=ar, q=[0.25, 0.5, 0.75], dim=['x', 'y'], w_dict={'x': [1, 1]})
>>> out
<xarray.DataArray (quantile: 3, z: 2)>
array([[ 8.  ,  0.75],
       [ 8.  ,  2.  ],
       [ 8.  ,  3.5 ]])
Coordinates:
  * z         (z) int64 8 9
  * quantile  (quantile) float64 0.25 0.5 0.75
>>>  np.nanpercentile(da_stacked, q=[25, 50, 75], axis=-1)
array([[ 8.  ,  0.75],
       [ 8.  ,  2.  ],
       [ 8.  ,  3.5 ]])
```

Lastly, different interpolation schemes are consistent:
```python
>>> out = weighted_quantile(da=ar, q=[0.25, 0.5, 0.75], dim=['x', 'y'], w_dict={'x': [1, 1]}, interpolation='nearest')
>>> out
<xarray.DataArray (quantile: 3, z: 2)>
array([[ 8.,  1.],
       [ 8.,  3.],
       [ 8.,  3.]])
Coordinates:
  * z         (z) int64 8 9
  * quantile  (quantile) float64 0.25 0.5 0.75
>>> np.nanpercentile(da_stacked, q=[25, 50, 75], axis=-1, interpolation='nearest')
array([[ 8.,  1.],
       [ 8.,  3.],
       [ 8.,  3.]])
```

We wonder if it's ok to make this feature part of numpy, probably in `np.nanpercentile`?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Weighted quantile option in nanpercentile() #8935

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Weighted quantile option in nanpercentile() #8935

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions