-
-
Notifications
You must be signed in to change notification settings - Fork 11k
DOC: Improve np.mean
documentation of the out argument
#25431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is a generic problem with how numpy integer arrays behave with overflow, see #8987. If that were fixed, your code would generate a warning or error on overflow. Because this is a general problem with numpy integer arrays, I'm not sure it's very useful to only document it once in this function. It would be better to have a place in the dtype documentation that goes over this problem and then we can link to it from elsewhere. |
Agreed - I'm -1 on adding a blurb like this individually to every |
Sorry, I made a mistake in the example code. I had a I think the issue is not per-se related to the overflows being "silent" in numpy. I agree with you that if there had been a warning, I would have spent a lot less time finding out the issue. But iIt's specific to the way the Maybe, to clarify, here is a code that works:
From the current documentation, I assumed this code and the one in the description were equivalent, but would avoid creating an intermediary array of Moreover, typically, in most numpy functions, the Honestly, my first idea was also to document better this behaviour inside the ufunc documentation. However, I think their behaviour is totally normal and expected (assuming you're aware of under/overflows). The fact that |
Ufunc reductions also behave like this (e.g. So like we said above, this is a generic feature of ufunc reductions. If you specify an |
Yes. I do agree on this. I think it's logical that the result is stored in the How (and where) would you suggest to clarify it? I can adjust this PR accordingly. |
I personally don't see |
Nothing wrong with a bit of duplicate documentation, many may be unfamiliar with the larger idea of "reduction". |
I've considered this a bit more carefully, and I think a documentation improvement of some kind would be helpful.
I agree that one would expect the overflow of:
The documentation of Ufunc methods does state that However, the user does not necessarily know that
But it doesn't mention that I considered an analogous example, although one less likely to be encountered in the wild: numpy.std([[0, 1e20]], axis=1, dtype=np.float64)
# array([5.e+19])
numpy.std([[0, 1e20]], axis=1, dtype=np.float64).astype(np.float32)
# array([5.e+19], dtype=float32)
out = np.zeros((1,), dtype=np.float32)
numpy.std([[0, 1e20]], axis=1, dtype=np.float64, out=out)
# array([inf], dtype=float32) So there are at least a few other functions where this documentation could be added (not to mention There seems to be some support for linking to generic documentation. What do you think of explicitly stating in the documentation of reducing statistics that they behave like "reduce-like methods" to use the phrasing of: and link to https://numpy.org/doc/stable/user/basics.ufuncs.html#ufunc-methods? I think it is also worth mentioning in the Would that address the issue @pieleric? |
Yes @mdhaber, that would most likely have saved me time, as I did read the ufuncs documentation, hopping I'd find something that explain this behaviour. So it should help others too. BTW, to be precise, when
I will adjust this PR with a suggestion for the ufuncs documentation. |
Document the effect of using the out argument, which can cause under/overflow issues with integers dtype. Also correct the description of how out with dtype is handled in ufuncs in general.
@mdhaber , I've updated this PR according to your suggestions. Honestly, I struggled a bit to find a good way to describe the behaviour. Let me know if you can think of a better English description of the limitation 😄 I didn't update all the "reducing statistics" function yet, as I'd rather first hear you feedback before copy-pasting to all the functions' docstring. I've also taken the opportunity to correct the description of out and dtype together, as dtype is not ignored. |
Will merge when CI finishes. |
@@ -3505,6 +3505,7 @@ def mean(a, axis=None, dtype=None, out=None, keepdims=np._NoValue, *, | |||
is ``None``; if provided, it must have the same shape as the | |||
expected output, but the type will be cast if necessary. | |||
See :ref:`ufuncs-output-type` for more details. | |||
See :ref:`ufuncs-output-type` for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdhaber , this line is now duplicated.
@mdhaber , thanks for looking into this PR. Sorry for not reacting before. I was sick, and had to focus on a lot of other things with the little time I had. |
I've spent a couple of hours pulling my hair out on a computation that should work but didn't. Eventually, it turned out that was due to a limitation of the
np.mean()
function. I thought I'd document it, for others avoid spending too much time on it.Essentially, the limitation looks like that:
=> Document the effect of using the out argument, which can cause under/overflow issues with integers dtype.