ENH: Avoid memory peak when creating a MaskedArray with mask=True/False. #6734

saimn · 2015-11-26T14:31:25Z

When the mask parameter is set to True or False, create directly a ndarray of
boolean instead of going inside np.resize which was causing of memory peak of
~15 times the size of the mask.

cc @charris

…se (numpy#6732). When the `mask` parameter is set to True or False, create directly a `ndarray` of boolean instead of going inside `np.resize` which was causing of memory peak of ~15 times the size of the mask.

saimn · 2015-11-26T14:34:29Z

numpy/ma/core.py

+            if mask is True:
+                mask = np.ones(_data.shape, dtype=mdtype)
+            elif mask is False:
+                mask = np.zeros(_data.shape, dtype=mdtype)


Maybe I should also check here that mdtype is dtype('bool') ?
If I understand correctly mdtype can be a list for record arrays, in which case mask = np.array(mask, copy=copy, dtype=mdtype) throws a TypeError, then we go to the except below which suppose that mask is a list.
So in theory one should not use mask = True or False with a record array, or a list this case is not handled ?

I don't know, but a check on mdtype sounds good if the wrong type raises an error. A try block isn't the best sort of flow control.

Are these cases currently covered by tests?

Ok, I will a test for mdtype. For the unit tests it is covered by tests which use mask=True or False in the constructor, but maybe I can a test specific to the constructor.

@charris done ! I have added a small test for mask=True/False, so this is explicitly tested.

saimn · 2015-11-26T14:47:48Z

BTW it is also much faster to allocate the mask array this way.

charris · 2015-12-01T17:55:40Z

@saimn, @gerritholl, @ahaldane, @jakirkham, I would like to get through the PRs involving masked arrays in the next few days and it would help if the various people working on that module could take a look at the other PRs. Just search on the component: numpy.ma label.

ENH: Avoid memory peak when creating a MaskedArray with mask=True/False.

charris · 2015-12-01T23:24:28Z

Thanks Simon. Is it really possible for mdtype to be other than MaskType?

charris · 2015-12-01T23:43:24Z

@saimn BTW, I've been lax. Could you add a note to doc/release/1.11.0-notes.rst under Improvements for both of your PRs?

saimn · 2015-12-02T09:24:45Z

@charris : With masked record arrays, _data.dtype.names is a tuple with the column names, and in this case mdtype = make_mask_descr(_data.dtype) which a multi-dimensional dtype.

See DOC: Add changelog for #6734 and #6748. #6754 for the changelog.

DOC: Add changelog for #6734 and #6748.

* 'master' of git://github.com/numpy/numpy: (24 commits) BENCH: allow benchmark suite to run on Python 3 TST: test f2py, fallback on f2py2.7 etc., fixes numpy#6718 BUG: link cblas library if cblas is detected BUG/TST: Fix for numpy#6724, make numpy.ma.mvoid consistent with numpy.void BUG/TST: Fix numpy#6760 by correctly describing mask on nested subdtypes BUG: resizing empty array with complex dtype failed DOC: Add changelog for numpy#6734 and numpy#6748. Use integer division to avoid casting to int. Allow to change the maximum width with a class variable. Add some tests for mask creation with mask=True or False. Test that the mask dtype if MaskType before using np.zeros/ones BUG/TST: Fix for numpy#6729 ENH: Avoid memory peak and useless computations when printing a MaskedArray. ENH: Avoid memory peak when creating a MaskedArray with mask=True/False (numpy#6732). BUG: Readd fallback CBLAS detection on linux. TST: Fix travis-ci test for numpy wheels. MAINT: Localize variables only used with relaxed stride checking. BUG: Fix for numpy#6719 MAINT: enable Werror=vla in travis BUG: Include relevant files from numpy/linalg/lapack_lite in sdist. ...

ENH: Avoid memory peak when creating a MaskedArray with mask=True/Fal…

531f2ad

…se (numpy#6732). When the `mask` parameter is set to True or False, create directly a `ndarray` of boolean instead of going inside `np.resize` which was causing of memory peak of ~15 times the size of the mask.

saimn reviewed Nov 26, 2015
View reviewed changes

charris added 00 - Bug component: numpy.ma masked arrays labels Nov 26, 2015

saimn added 2 commits December 1, 2015 22:07

Test that the mask dtype if MaskType before using np.zeros/ones

70d8cf5

Add some tests for mask creation with mask=True or False.

511dab4

charris added a commit that referenced this pull request Dec 1, 2015

Merge pull request #6734 from saimn/ma-mask-memory

b3a8994

ENH: Avoid memory peak when creating a MaskedArray with mask=True/False.

charris merged commit b3a8994 into numpy:master Dec 1, 2015

saimn deleted the ma-mask-memory branch December 2, 2015 08:40

saimn added a commit to saimn/numpy that referenced this pull request Dec 2, 2015

DOC: Add changelog for numpy#6734 and numpy#6748.

f752d84

charris added a commit that referenced this pull request Dec 2, 2015

Merge pull request #6754 from saimn/ma-changelog

45ff556

DOC: Add changelog for #6734 and #6748.

saimn mentioned this pull request Feb 4, 2016

High memory peak when MaskedArray creates a mask #6732

Closed

jaimefrio pushed a commit to jaimefrio/numpy that referenced this pull request Mar 22, 2016

DOC: Add changelog for numpy#6734 and numpy#6748.

071b6de

saimn mentioned this pull request Sep 2, 2020

ENH: Set mask=None equivalent to mask=False in np.ma.array() #17212

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Avoid memory peak when creating a MaskedArray with mask=True/False. #6734

ENH: Avoid memory peak when creating a MaskedArray with mask=True/False. #6734

Uh oh!

saimn commented Nov 26, 2015

Uh oh!

saimn Nov 26, 2015

Uh oh!

charris Dec 1, 2015

Uh oh!

charris Dec 1, 2015

Uh oh!

saimn Dec 1, 2015

Uh oh!

saimn Dec 1, 2015

Uh oh!

saimn commented Nov 26, 2015

Uh oh!

charris commented Dec 1, 2015

Uh oh!

charris commented Dec 1, 2015

Uh oh!

charris commented Dec 1, 2015

Uh oh!

saimn commented Dec 2, 2015

Uh oh!

Uh oh!

Uh oh!

ENH: Avoid memory peak when creating a MaskedArray with mask=True/False. #6734

ENH: Avoid memory peak when creating a MaskedArray with mask=True/False. #6734

Uh oh!

Conversation

saimn commented Nov 26, 2015

Uh oh!

saimn Nov 26, 2015

Choose a reason for hiding this comment

Uh oh!

charris Dec 1, 2015

Choose a reason for hiding this comment

Uh oh!

charris Dec 1, 2015

Choose a reason for hiding this comment

Uh oh!

saimn Dec 1, 2015

Choose a reason for hiding this comment

Uh oh!

saimn Dec 1, 2015

Choose a reason for hiding this comment

Uh oh!

saimn commented Nov 26, 2015

Uh oh!

charris commented Dec 1, 2015

Uh oh!

charris commented Dec 1, 2015

Uh oh!

charris commented Dec 1, 2015

Uh oh!

saimn commented Dec 2, 2015

Uh oh!

Uh oh!