group mean over several axes of array with nans is wrong

What happens is the mean is computed on each axis in turn (mean of mean). When no nans are involved, we get theoretically the same result. In practice, we loose some precision but it was deemed acceptable so far. However, when nans are involved, the result is significantly wrong.

```python
>>> arr = Array([[1, 3], [4, nan]], [Axis('a=a0,a1'), Axis('b=b0,b1')])
>>> arr
a\b   b0   b1
 a0  1.0  3.0
 a1  4.0  nan
>>> arr.mean("a0,a1 >> a01", "b0,b1 >> b01")
2.75
```
While this should be 2.6666... What happens is that it computes:

```python
>>> (((1 + 4) / 2) + ((3 + 0) / 1)) / 2
2.75
>>> 1/4 + 4/4 + 3/2
2.75
```

Instead of:

```python
>>> (1 + 4 + 3) / 3
2.6666666666666665
>>> 1/3 + 4/3 + 3/3
2.6666666666666665
```

As a workaround until larray 0.35 is released, I have recommended using:

```python
>>> # TODO: do not use this function anymore when larray 0.35 will be available
... def nd_mean(array, axes_or_groups):
...     """
...     Computes the mean of array over axes_or_groups.
...     
...     This function is temporarily necessary because larray versions up to (and including) 0.34.x
...     behave badly when computing the means on groups over several dimensions 
...     when some values are nans. See https://github.com/larray-project/larray/issues/1118
...     """
...     return array.sum(*axes_or_groups) / (~isnan(array)).sum(*axes_or_groups)
>>> nd_mean(arr, ("a0,a1 >> a01", "b0,b1 >> b01"))
2.6666666666666665
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

group mean over several axes of array with nans is wrong #1118

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

group mean over several axes of array with nans is wrong #1118

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions