-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Introduce new functions mean_std and mean_var. #23889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…and the mean: mean_std and the variance and the mean: mean_var. IN the current setup the mean is often calculated twice, this can be prevented if the functions calculating the variance of the standard deviation also return the mean,
…and the mean: mean_std and the variance and the mean: mean_var. IN the current setup the mean is often calculated twice, this can be prevented if the functions calculating the variance of the standard deviation also return the mean, Fix typo's, minor problems and add tests to increase coverage. Note: fromnumeric.py is littered with constructions like: ```Python if type(a) is not mu.ndarray: try: var = a.var except AttributeError: pass else: return var(axis=axis, dtype=dtype, out=out, ddof=ddof, **kwargs) ``` This looks like untested defensive coding, and because the data object is not passed to the method it is disfunctional. I copied it, but I think it can go. Which would be my preferred way of improving the code coverage here.
I don't know how to address the azure-pipeline problems. |
Chocolatey has deprecated the `--side-by-side` option and suggest either `--force` or `--allow-downgrade` instead. IIRC, we started with `--force` and it failed. (cherry picked from commit c055ee0)
See also the discussion on the mailing list and on the issue #23741. |
There was also a suggestion for an implementation using ufunc loops in a comment to issue 13199 |
The discussion on ufuncs is mostly independent. I moved most code from the _var function to _mean_var. The ufuncs discussion was focusing on rewriting _var, now that would have to become a rewrite of _mean_var and all the benefits of a rewrite would also benefit mean_var, var, std, mean_std and std as they all call _mean_var. |
Short note after today's triage review:
Also it would be good to see whether this has been discussed at the array API site: https://data-apis.org/array-api/latest/ |
The discussion on standardizing the statistical functions doesn't mention mean_std or mean_var or variants thereof. Also there is no mention of passing the mean into the calculation of var or std. It is mentioned that keywords can be somewhat freely added. Passing in the mean is consistent with that. @seberg Would it make sense to create an issue for this? |
No. First, it seems like its super low priority: I doubt there are a host of libraries itching to use it (or needing it). Second, nobody has it in practice, so there is no prior art for them to go on. |
For the solution as discussed at the triage review it makes sense (to me at least) to close this PR and start a new one referencing this one. To implement that solution based on what is included in this PR a lot of work needs to be undone, while making those changes on top of the main branch requires in my view minor changes both in code and documentation. |
Instead of this solution a new PR will be opened to add the mean keyword to var and std. |
ENH: Introduce two new functions that return:
Often when the standard deviation is needed the mean is also needed; the same holds for the variance and the mean. With the current code the mean is then calculated twice, this can be prevented if the functions calculating the variance or the standard deviation also return the obtained mean. See ticket #23741. These changes can be largely independent from those for ticket #13199, which suggests reducing memory usage by avoiding storing data-mean in a separate intermediate array.
Note:
fromnumeric.py is littered with constructions like:
This looks like untested defensive coding, and because the data object is not passed to the method it is probably dysfunctional. I copied it, but I think it can go. Which would be my preferred way of improving the code coverage here.In fact this is where the alternative implementation for MaskedArrays gets called.