Skip to content

ENH: add mean keyword to std and var #24126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jul 7, 2023

Conversation

RonaldAJ
Copy link
Contributor

@RonaldAJ RonaldAJ commented Jul 5, 2023

Replaces previous pull request #23889 (ENH: Introduce new functions mean_std and mean_var.) after triage review.

Often when the standard deviation is needed the mean is also needed; the same holds for the variance and the mean. With the current code the mean is then calculated twice, this can be prevented if the functions calculating the variance or the standard deviation can use a precalculated mean. See ticket #23741.

These changes can be largely independent from those for ticket #13199, which suggests reducing memory usage by avoiding storing data-mean in a separate intermediate array.

See also: discussion on the mailing list

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a good solution! I also like that now one can put in something else than the mean from the same data (which is often useful). But see the important comment about subclasses, plus a lot of more nitpicky ones.

Note that I'm surprised MaskedArray does not work. Did you try again? Could you explain the problem? And is this a new problem, or did the functions not work even before? If they worked before, it would be much nicer to get MaskedArray right too.

@RonaldAJ
Copy link
Contributor Author

RonaldAJ commented Jul 6, 2023

Note that I'm surprised MaskedArray does not work. Did you try again? Could you explain the problem? And is this a new problem, or did the functions not work even before? If they worked before, it would be much nicer to get MaskedArray right too.

I now added support for "mean" keyword on MaskedArrays.

RonaldAJ added 3 commits July 6, 2023 12:13
(numpy#24126 (comment)) The dispatcher returns all arguments that can, in principle, contain something that is an array and hence that, if not a regular ndarray, can allow the function to be dealt with another package (say, dask). Since mean can be an array, it should be added (most similar to where).
@RonaldAJ
Copy link
Contributor Author

RonaldAJ commented Jul 6, 2023

@charris The cygwin test failed, but from the reported error I guess it has little to do with my change. Will it run again automagically?

@mattip
Copy link
Member

mattip commented Jul 6, 2023

I re-ran it manually. If it fails again we can ignore the failure.

Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, @mhvk already did all the review. Just a few nits about the release note and docstrings.

@RonaldAJ
Copy link
Contributor Author

RonaldAJ commented Jul 6, 2023

The output of the examples depends on platform and circumstances. Therefore testing based on example code gives errors:

-------------- 2023-07-06T19:09:55.1262830Z 2023-07-06T19:09:55.1263570Z File "build/testenv/lib/python3.9/site-packages/numpy/__init__.py", line ?, in std 2023-07-06T19:09:55.1264100Z Failed example: 2023-07-06T19:09:55.1264890Z print(f'Percentage execution time saved {100*(t2-t1)/t2:.0f}%') 2023-07-06T19:09:55.1265550Z Expected: 2023-07-06T19:09:55.1266030Z Percentage execution time saved 30% 2023-07-06T19:09:55.1266470Z Got: 2023-07-06T19:09:55.1266880Z Percentage execution time saved 32% 2023-07-06T19:09:55.1267080Z 2023-07-06T19:09:55.1267310Z numpy.core.var 2023-07-06T19:09:55.1267770Z -------------- 2023-07-06T19:09:55.1267930Z 2023-07-06T19:09:55.1268440Z File "build/testenv/lib/python3.9/site-packages/numpy/__init__.py", line ?, in var 2023-07-06T19:09:55.1268800Z Failed example: 2023-07-06T19:09:55.1269300Z print(f'Percentage execution time saved {100*(t2-t1)/t2:.0f}%') 2023-07-06T19:09:55.1269630Z Expected: 2023-07-06T19:09:55.1275670Z Percentage execution time saved 32% 2023-07-06T19:09:55.1276120Z Got: 2023-07-06T19:09:55.1276900Z Percentage execution time saved 31%

Is there a method for handling this?

@RonaldAJ
Copy link
Contributor Author

RonaldAJ commented Jul 6, 2023

Added: #doctest: +SKIP

@mattip
Copy link
Member

mattip commented Jul 6, 2023

Is there a method for handling this?

You could do something like

# Using mean saves around 32% of the compute time
>>> time_saved = (t2-t1)/t2
>>> assert 0.3 < time_saved < 0.35

@RonaldAJ
Copy link
Contributor Author

RonaldAJ commented Jul 7, 2023

Is there a method for handling this?

You could do something like

# Using mean saves around 32% of the compute time
>>> time_saved = (t2-t1)/t2
>>> assert 0.3 < time_saved < 0.35

#doctest: +SKIP did the trick

@mattip mattip merged commit ab2178b into numpy:main Jul 7, 2023
@mattip
Copy link
Member

mattip commented Jul 7, 2023

Thanks @RonaldAJ and @mhvk. This is a nice API addition.

@seberg seberg added the 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. label Jul 7, 2023
@RonaldAJ
Copy link
Contributor Author

RonaldAJ commented Jul 7, 2023

I think that issue #23741 can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants