-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Avoid memory peak when creating a MaskedArray with mask=True/False. #6734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…se (numpy#6732). When the `mask` parameter is set to True or False, create directly a `ndarray` of boolean instead of going inside `np.resize` which was causing of memory peak of ~15 times the size of the mask.
if mask is True: | ||
mask = np.ones(_data.shape, dtype=mdtype) | ||
elif mask is False: | ||
mask = np.zeros(_data.shape, dtype=mdtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I should also check here that mdtype is dtype('bool') ?
If I understand correctly mdtype can be a list for record arrays, in which case mask = np.array(mask, copy=copy, dtype=mdtype)
throws a TypeError, then we go to the except below which suppose that mask is a list.
So in theory one should not use mask = True or False with a record array, or a list this case is not handled ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, but a check on mdtype
sounds good if the wrong type raises an error. A try block isn't the best sort of flow control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these cases currently covered by tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will a test for mdtype
. For the unit tests it is covered by tests which use mask=True or False in the constructor, but maybe I can a test specific to the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charris done ! I have added a small test for mask=True/False, so this is explicitly tested.
BTW it is also much faster to allocate the mask array this way. |
@saimn, @gerritholl, @ahaldane, @jakirkham, I would like to get through the PRs involving masked arrays in the next few days and it would help if the various people working on that module could take a look at the other PRs. Just search on the |
ENH: Avoid memory peak when creating a MaskedArray with mask=True/False.
Thanks Simon. Is it really possible for |
@saimn BTW, I've been lax. Could you add a note to |
@charris : With masked record arrays,
|
* 'master' of git://github.com/numpy/numpy: (24 commits) BENCH: allow benchmark suite to run on Python 3 TST: test f2py, fallback on f2py2.7 etc., fixes numpy#6718 BUG: link cblas library if cblas is detected BUG/TST: Fix for numpy#6724, make numpy.ma.mvoid consistent with numpy.void BUG/TST: Fix numpy#6760 by correctly describing mask on nested subdtypes BUG: resizing empty array with complex dtype failed DOC: Add changelog for numpy#6734 and numpy#6748. Use integer division to avoid casting to int. Allow to change the maximum width with a class variable. Add some tests for mask creation with mask=True or False. Test that the mask dtype if MaskType before using np.zeros/ones BUG/TST: Fix for numpy#6729 ENH: Avoid memory peak and useless computations when printing a MaskedArray. ENH: Avoid memory peak when creating a MaskedArray with mask=True/False (numpy#6732). BUG: Readd fallback CBLAS detection on linux. TST: Fix travis-ci test for numpy wheels. MAINT: Localize variables only used with relaxed stride checking. BUG: Fix for numpy#6719 MAINT: enable Werror=vla in travis BUG: Include relevant files from numpy/linalg/lapack_lite in sdist. ...
Ref #6732.
When the
mask
parameter is set to True or False, create directly andarray
ofboolean instead of going inside
np.resize
which was causing of memory peak of~15 times the size of the mask.
cc @charris