BUG: str and repr special methods blow up memory usage #3544

cpelley · 2013-07-23T09:26:52Z

>>> arr = np.zeros([43200, 21600], dtype=np.int8)
>>> z = np.ma.masked_values(arr, 0)
>>> print z
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/sci/lib/python2.7/site-packages/numpy/ma/core.py", line 3530, in __str__
    res = self._data.astype("|O8")
MemoryError

Calculated memory usage:
(43200_21600._8) + (43200*21600)
8398080000.0 bits --> 0.9776651859283 GB

charris · 2013-07-23T16:39:39Z

The res array is of object type, 8 bytes/object on my system, so the result would be about 8GB just for the pointers to the objects. Where is the string part of the example?

pv · 2013-07-23T16:48:40Z

I think the complaint is that __str__ shouldn't cast it to an object array in the first place in the intermediate steps.
It will in any case print just

[[-- -- -- ..., -- -- --]
 [-- -- -- ..., -- -- --]
 [-- -- -- ..., -- -- --]
 ..., 
 [-- -- -- ..., -- -- --]
 [-- -- -- ..., -- -- --]
 [-- -- -- ..., -- -- --]]

but here this takes ~ 9,6 GB of memory :)

cpelley · 2013-07-25T09:24:05Z

Thank you both for your interest, yes @pv you are describing the issue a lot clearer than I have 😄

The trouble is that I am not able to view my array which itself comfortably fits in memory, especially as @pv has indicated above, results in a textual output which is tiny.

I should have said ~2GB of memory usage for the actual data itself (forgot that the mask array is stored as a byte array).

saimn · 2015-11-26T14:53:57Z

Following #6732 I also see a huge memory peak when printing a masked array, and the worse is that this memory is then lost, even if the array is dereferenced.

In [1]: import ipython_memory_usage.ipython_memory_usage as imu

In [2]: import numpy as np

In [3]: imu.start_watching_memory()
In [3] used 10.5586 MiB RAM in 4.25s, peaked 0.00 MiB above current, total RAM usage 35.71 MiB

In [4]: a = np.zeros((10000, 10000))
In [4] used 0.0312 MiB RAM in 0.11s, peaked 0.00 MiB above current, total RAM usage 35.74 MiB

In [5]: a[:] = 1
In [5] used 763.0000 MiB RAM in 0.21s, peaked 0.00 MiB above current, total RAM usage 798.74 MiB

In [6]: m = np.ma.MaskedArray(data=a, mask=np.ma.nomask, dtype=float, copy=False)
In [6] used 0.0000 MiB RAM in 0.11s, peaked 0.00 MiB above current, total RAM usage 798.74 MiB

In [7]: m
Out[7]: 
masked_array(data =
 [[ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 ..., 
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]],
             mask =
 False,
       fill_value = 1e+20)
In [7] used 0.0664 MiB RAM in 0.11s, peaked 0.00 MiB above current, total RAM usage 798.81 MiB

In [8]: m = np.ma.MaskedArray(data=a, mask=False, dtype=float, copy=False)
In [8] used 95.4336 MiB RAM in 10.31s, peaked 1525.52 MiB above current, total RAM usage 894.24 MiB

In [11]: m
Out[11]: 
masked_array(data =
 [[1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 ..., 
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]],
             mask =
 [[False False False ..., False False False]
 [False False False ..., False False False]
 [False False False ..., False False False]
 ..., 
 [False False False ..., False False False]
 [False False False ..., False False False]
 [False False False ..., False False False]],
       fill_value = 1e+20)
In [11] used 2343.6602 MiB RAM in 2.03s, peaked 762.87 MiB above current, total RAM usage 3237.91 MiB

In [12]: m
Out[12]: 
masked_array(data =
 [[1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 ..., 
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]],
             mask =
 [[False False False ..., False False False]
 [False False False ..., False False False]
 [False False False ..., False False False]
 ..., 
 [False False False ..., False False False]
 [False False False ..., False False False]
 [False False False ..., False False False]],
       fill_value = 1e+20)
In [12] used 0.0156 MiB RAM in 2.02s, peaked 762.94 MiB above current, total RAM usage 3237.92 MiB

In [13]: m.shape
Out[13]: (10000, 10000)
In [13] used 0.0117 MiB RAM in 0.19s, peaked 0.00 MiB above current, total RAM usage 3237.93 MiB

In [14]: m = np.ma.MaskedArray(data=a, mask=np.ma.nomask, dtype=float, copy=False)
In [14] used 0.0039 MiB RAM in 0.18s, peaked 0.00 MiB above current, total RAM usage 3237.94 MiB

In [15]: m
Out[15]: 
masked_array(data =
 [[ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 ..., 
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]],
             mask =
 False,
       fill_value = 1e+20)
In [15] used 0.0000 MiB RAM in 0.18s, peaked 0.00 MiB above current, total RAM usage 3237.94 MiB

…dArray. Ref numpy#3544. When printing a `MaskedArray`, the whole array is converted to the object dtype, whereas only a few values are printed to screen. So the approach here is to cut the array and keep only a subset that it used for the string conversion. This way the output should not change.

saimn · 2016-02-04T09:26:19Z

Can be closed I think, since #6748 was merged.

cpelley · 2016-02-04T10:23:04Z

Hi @saimn, thanks for your PR. Testing now.

cpelley · 2016-02-04T10:29:48Z

Confirmed, thanks again @saimn

…dArray. Ref numpy#3544. When printing a `MaskedArray`, the whole array is converted to the object dtype, whereas only a few values are printed to screen. So the approach here is to cut the array and keep only a subset that it used for the string conversion. This way the output should not change.

charris added Defect labels Feb 23, 2014

jjhelmus added the component: numpy.ma masked arrays label Oct 30, 2015

saimn mentioned this issue Nov 30, 2015

ENH: Avoid memory peak and useless computations when printing a MaskedArray. #6748

Merged

cpelley closed this as completed Feb 4, 2016

abalkin mentioned this issue May 11, 2016

BUG: error in printing masked arrays #7621

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: str and repr special methods blow up memory usage #3544

BUG: str and repr special methods blow up memory usage #3544

cpelley commented Jul 23, 2013

charris commented Jul 23, 2013

pv commented Jul 23, 2013

cpelley commented Jul 25, 2013

saimn commented Nov 26, 2015

saimn commented Feb 4, 2016

cpelley commented Feb 4, 2016

cpelley commented Feb 4, 2016

BUG: str and repr special methods blow up memory usage #3544

BUG: str and repr special methods blow up memory usage #3544

Comments

cpelley commented Jul 23, 2013

charris commented Jul 23, 2013

pv commented Jul 23, 2013

cpelley commented Jul 25, 2013

saimn commented Nov 26, 2015

saimn commented Feb 4, 2016

cpelley commented Feb 4, 2016

cpelley commented Feb 4, 2016