Skip to content

ENH: Avoid memory peak when creating a MaskedArray with mask=True/False. #6734

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 1, 2015

Conversation

saimn
Copy link
Contributor

@saimn saimn commented Nov 26, 2015

Ref #6732.

When the mask parameter is set to True or False, create directly a ndarray of
boolean instead of going inside np.resize which was causing of memory peak of
~15 times the size of the mask.

cc @charris

…se (numpy#6732).

When the `mask` parameter is set to True or False, create directly a `ndarray` of
boolean instead of going inside `np.resize` which was causing of memory peak of
~15 times the size of the mask.
if mask is True:
mask = np.ones(_data.shape, dtype=mdtype)
elif mask is False:
mask = np.zeros(_data.shape, dtype=mdtype)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should also check here that mdtype is dtype('bool') ?
If I understand correctly mdtype can be a list for record arrays, in which case mask = np.array(mask, copy=copy, dtype=mdtype) throws a TypeError, then we go to the except below which suppose that mask is a list.
So in theory one should not use mask = True or False with a record array, or a list this case is not handled ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, but a check on mdtype sounds good if the wrong type raises an error. A try block isn't the best sort of flow control.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these cases currently covered by tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will a test for mdtype. For the unit tests it is covered by tests which use mask=True or False in the constructor, but maybe I can a test specific to the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charris done ! I have added a small test for mask=True/False, so this is explicitly tested.

@saimn
Copy link
Contributor Author

saimn commented Nov 26, 2015

BTW it is also much faster to allocate the mask array this way.

@charris
Copy link
Member

charris commented Dec 1, 2015

@saimn, @gerritholl, @ahaldane, @jakirkham, I would like to get through the PRs involving masked arrays in the next few days and it would help if the various people working on that module could take a look at the other PRs. Just search on the component: numpy.ma label.

charris added a commit that referenced this pull request Dec 1, 2015
ENH: Avoid memory peak when creating a MaskedArray with mask=True/False.
@charris charris merged commit b3a8994 into numpy:master Dec 1, 2015
@charris
Copy link
Member

charris commented Dec 1, 2015

Thanks Simon. Is it really possible for mdtype to be other than MaskType?

@charris
Copy link
Member

charris commented Dec 1, 2015

@saimn BTW, I've been lax. Could you add a note to doc/release/1.11.0-notes.rst under Improvements for both of your PRs?

@saimn saimn deleted the ma-mask-memory branch December 2, 2015 08:40
saimn added a commit to saimn/numpy that referenced this pull request Dec 2, 2015
@saimn
Copy link
Contributor Author

saimn commented Dec 2, 2015

@charris : With masked record arrays, _data.dtype.names is a tuple with the column names, and in this case mdtype = make_mask_descr(_data.dtype) which a multi-dimensional dtype.

charris added a commit that referenced this pull request Dec 2, 2015
colbych added a commit to colbych/numpy that referenced this pull request Dec 6, 2015
* 'master' of git://github.com/numpy/numpy: (24 commits)
  BENCH: allow benchmark suite to run on Python 3
  TST: test f2py, fallback on f2py2.7 etc., fixes numpy#6718
  BUG: link cblas library if cblas is detected
  BUG/TST: Fix for numpy#6724, make numpy.ma.mvoid consistent with numpy.void
  BUG/TST: Fix numpy#6760 by correctly describing mask on nested subdtypes
  BUG: resizing empty array with complex dtype failed
  DOC: Add changelog for numpy#6734 and numpy#6748.
  Use integer division to avoid casting to int.
  Allow to change the maximum width with a class variable.
  Add some tests for mask creation with mask=True or False.
  Test that the mask dtype if MaskType before using np.zeros/ones
  BUG/TST: Fix for numpy#6729
  ENH: Avoid memory peak and useless computations when printing a MaskedArray.
  ENH: Avoid memory peak when creating a MaskedArray with mask=True/False (numpy#6732).
  BUG: Readd fallback CBLAS detection on linux.
  TST: Fix travis-ci test for numpy wheels.
  MAINT: Localize variables only used with relaxed stride checking.
  BUG: Fix for numpy#6719
  MAINT: enable Werror=vla in travis
  BUG: Include relevant files from numpy/linalg/lapack_lite in sdist.
  ...
jaimefrio pushed a commit to jaimefrio/numpy that referenced this pull request Mar 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants