-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
High memory peak when MaskedArray creates a mask #6732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just for reference, could you post the numpy version as well? |
Yep sorry, I'm using 1.10.1 |
The reason I asked was that 1.10.1 has a known problem with structured/record arrays that shows up as long run times and huge memory consumption. You don't mention whether structured/record arrays are being used in addition to masked arrays, so it might not be the same problem. There was also at least one problem with masked arrays. Could you try 1.10.2rc1 by any chance? It would also be good to know if numpy 1.9 was better so as to determine if this is a regression or a long standing problem. |
I don't use record arrays here (but I known about this issue with astropy.io.fits). I have just forked and pulled the master branch and I get the same behavior. |
…se (numpy#6732). When the `mask` parameter is set to True or False, create directly a `ndarray` of boolean instead of going inside `np.resize` which was causing of memory peak of ~15 times the size of the mask.
* 'master' of git://github.com/numpy/numpy: (24 commits) BENCH: allow benchmark suite to run on Python 3 TST: test f2py, fallback on f2py2.7 etc., fixes numpy#6718 BUG: link cblas library if cblas is detected BUG/TST: Fix for numpy#6724, make numpy.ma.mvoid consistent with numpy.void BUG/TST: Fix numpy#6760 by correctly describing mask on nested subdtypes BUG: resizing empty array with complex dtype failed DOC: Add changelog for numpy#6734 and numpy#6748. Use integer division to avoid casting to int. Allow to change the maximum width with a class variable. Add some tests for mask creation with mask=True or False. Test that the mask dtype if MaskType before using np.zeros/ones BUG/TST: Fix for numpy#6729 ENH: Avoid memory peak and useless computations when printing a MaskedArray. ENH: Avoid memory peak when creating a MaskedArray with mask=True/False (numpy#6732). BUG: Readd fallback CBLAS detection on linux. TST: Fix travis-ci test for numpy wheels. MAINT: Localize variables only used with relaxed stride checking. BUG: Fix for numpy#6719 MAINT: enable Werror=vla in travis BUG: Include relevant files from numpy/linalg/lapack_lite in sdist. ...
Forgot to close this one (since #6734 was merged). |
…se (numpy#6732). When the `mask` parameter is set to True or False, create directly a `ndarray` of boolean instead of going inside `np.resize` which was causing of memory peak of ~15 times the size of the mask.
Hi,
We work with huge masked arrays, and noticed that there is a high memory peak when creating a masked array with
mask=False
. In the example below you can see that when usingMaskedArray(mask=np.ma.nomask ...)
the memory does not increase and the masked array use a view of the data array. But when usingMaskedArray(mask=False ...)
there is a memory peak of ~15 times the size of the boolean mask which is created !The issue comes from the line
mask = np.resize(mask, _data.shape)
in https://github.com/numpy/numpy/blob/master/numpy/ma/core.py#L2770, where mask is justarray([False], dtype=bool)
. And thennp.resize
callconcatenate((a,)*n_copies)
which causes the memory peak (https://github.com/numpy/numpy/blob/master/numpy/core/fromnumeric.py#L1149).Without knowing the reasons of this implementation, I wonder why the mask is not created simply with a
np.zeros(dtype=bool, ...)
(ornp.ones
depending the value of the mask parameter).The text was updated successfully, but these errors were encountered: