-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Closed
Description
For our work we frequently need to compute weighted quantiles. This is especially important when we need to weigh data from recent years more heavily in making predictions.
I've put together a function (called weighted_quantile
) largely based on the source code of np.percentile
. It allows one to input weights along a single dimension, as a dict w_dict
. Below are some manual tests, using xarray
as starting point:
When all weights = 1, it's identical to using np.percentile
:
>>> ar0
<xarray.DataArray (x: 3, y: 4)>
array([[3, 4, 8, 1],
[5, 3, 7, 9],
[4, 9, 6, 2]])
Coordinates:
* x (x) |S1 'a' 'b' 'c'
* y (y) int64 0 1 2 3
>>> np.percentile(ar0, q=[25, 50, 75], axis=-1)
array([[ 2.5 , 4.5 , 3.5 ],
[ 3.5 , 6. , 5. ],
[ 5. , 7.5 , 6.75]])
>>> weighted_quantile(da=ar0, q=[0.25, 0.5, 0.75], dim='y', w_dict={'y': [1,1,1,1]})
<xarray.DataArray (quantile: 3, x: 3)>
array([[ 2.5 , 4.5 , 3.5 ],
[ 3.5 , 6. , 5. ],
[ 5. , 7.5 , 6.75]])
Coordinates:
* x (x) |S1 'a' 'b' 'c'
* quantile (quantile) float64 0.25 0.5 0.75
Now different weights:
>>> weighted_quantile(da=ar0, q=[0.25, 0.5, 0.75], dim='y', w_dict={'y': [1,2,3,4.0]})
<xarray.DataArray (quantile: 3, x: 3)>
array([[ 3.25 , 5.666667, 4.333333],
[ 4. , 7. , 5.333333],
[ 6. , 8. , 6.75 ]])
Coordinates:
* x (x) |S1 'a' 'b' 'c'
* quantile (quantile) float64 0.25 0.5 0.75
Also handles nan values like np.nanpercentile
:
>>> ar
<xarray.DataArray (x: 2, y: 2, z: 2)>
array([[[ nan, 3.],
[ nan, 5.]],
[[ 8., 1.],
[ nan, 0.]]])
Coordinates:
* x (x) |S1 'a' 'b'
* y (y) int64 0 1
* z (z) int64 8 9
>>> da_stacked = ar.stack(mi=['x', 'y'])
>>> out = weighted_quantile(da=ar, q=[0.25, 0.5, 0.75], dim=['x', 'y'], w_dict={'x': [1, 1]})
>>> out
<xarray.DataArray (quantile: 3, z: 2)>
array([[ 8. , 0.75],
[ 8. , 2. ],
[ 8. , 3.5 ]])
Coordinates:
* z (z) int64 8 9
* quantile (quantile) float64 0.25 0.5 0.75
>>> np.nanpercentile(da_stacked, q=[25, 50, 75], axis=-1)
array([[ 8. , 0.75],
[ 8. , 2. ],
[ 8. , 3.5 ]])
Lastly, different interpolation schemes are consistent:
>>> out = weighted_quantile(da=ar, q=[0.25, 0.5, 0.75], dim=['x', 'y'], w_dict={'x': [1, 1]}, interpolation='nearest')
>>> out
<xarray.DataArray (quantile: 3, z: 2)>
array([[ 8., 1.],
[ 8., 3.],
[ 8., 3.]])
Coordinates:
* z (z) int64 8 9
* quantile (quantile) float64 0.25 0.5 0.75
>>> np.nanpercentile(da_stacked, q=[25, 50, 75], axis=-1, interpolation='nearest')
array([[ 8., 1.],
[ 8., 3.],
[ 8., 3.]])
We wonder if it's ok to make this feature part of numpy, probably in np.nanpercentile
?
amitport and yfarjoun