Skip to content

BUG: str and repr special methods blow up memory usage #3544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cpelley opened this issue Jul 23, 2013 · 7 comments
Closed

BUG: str and repr special methods blow up memory usage #3544

cpelley opened this issue Jul 23, 2013 · 7 comments

Comments

@cpelley
Copy link

cpelley commented Jul 23, 2013

>>> arr = np.zeros([43200, 21600], dtype=np.int8)
>>> z = np.ma.masked_values(arr, 0)
>>> print z
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/sci/lib/python2.7/site-packages/numpy/ma/core.py", line 3530, in __str__
    res = self._data.astype("|O8")
MemoryError

Calculated memory usage:
(43200_21600._8) + (43200*21600)
8398080000.0 bits --> 0.9776651859283 GB

@charris
Copy link
Member

charris commented Jul 23, 2013

The res array is of object type, 8 bytes/object on my system, so the result would be about 8GB just for the pointers to the objects. Where is the string part of the example?

@pv
Copy link
Member

pv commented Jul 23, 2013

I think the complaint is that __str__ shouldn't cast it to an object array in the first place in the intermediate steps.
It will in any case print just

[[-- -- -- ..., -- -- --]
 [-- -- -- ..., -- -- --]
 [-- -- -- ..., -- -- --]
 ..., 
 [-- -- -- ..., -- -- --]
 [-- -- -- ..., -- -- --]
 [-- -- -- ..., -- -- --]]

but here this takes ~ 9,6 GB of memory :)

@cpelley
Copy link
Author

cpelley commented Jul 25, 2013

Thank you both for your interest, yes @pv you are describing the issue a lot clearer than I have 😄

The trouble is that I am not able to view my array which itself comfortably fits in memory, especially as @pv has indicated above, results in a textual output which is tiny.

I should have said ~2GB of memory usage for the actual data itself (forgot that the mask array is stored as a byte array).

@jjhelmus jjhelmus added the component: numpy.ma masked arrays label Oct 30, 2015
@saimn
Copy link
Contributor

saimn commented Nov 26, 2015

Following #6732 I also see a huge memory peak when printing a masked array, and the worse is that this memory is then lost, even if the array is dereferenced.

In [1]: import ipython_memory_usage.ipython_memory_usage as imu

In [2]: import numpy as np

In [3]: imu.start_watching_memory()
In [3] used 10.5586 MiB RAM in 4.25s, peaked 0.00 MiB above current, total RAM usage 35.71 MiB

In [4]: a = np.zeros((10000, 10000))
In [4] used 0.0312 MiB RAM in 0.11s, peaked 0.00 MiB above current, total RAM usage 35.74 MiB

In [5]: a[:] = 1
In [5] used 763.0000 MiB RAM in 0.21s, peaked 0.00 MiB above current, total RAM usage 798.74 MiB

In [6]: m = np.ma.MaskedArray(data=a, mask=np.ma.nomask, dtype=float, copy=False)
In [6] used 0.0000 MiB RAM in 0.11s, peaked 0.00 MiB above current, total RAM usage 798.74 MiB

In [7]: m
Out[7]: 
masked_array(data =
 [[ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 ..., 
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]],
             mask =
 False,
       fill_value = 1e+20)
In [7] used 0.0664 MiB RAM in 0.11s, peaked 0.00 MiB above current, total RAM usage 798.81 MiB

In [8]: m = np.ma.MaskedArray(data=a, mask=False, dtype=float, copy=False)
In [8] used 95.4336 MiB RAM in 10.31s, peaked 1525.52 MiB above current, total RAM usage 894.24 MiB

In [11]: m
Out[11]: 
masked_array(data =
 [[1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 ..., 
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]],
             mask =
 [[False False False ..., False False False]
 [False False False ..., False False False]
 [False False False ..., False False False]
 ..., 
 [False False False ..., False False False]
 [False False False ..., False False False]
 [False False False ..., False False False]],
       fill_value = 1e+20)
In [11] used 2343.6602 MiB RAM in 2.03s, peaked 762.87 MiB above current, total RAM usage 3237.91 MiB

In [12]: m
Out[12]: 
masked_array(data =
 [[1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 ..., 
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]
 [1.0 1.0 1.0 ..., 1.0 1.0 1.0]],
             mask =
 [[False False False ..., False False False]
 [False False False ..., False False False]
 [False False False ..., False False False]
 ..., 
 [False False False ..., False False False]
 [False False False ..., False False False]
 [False False False ..., False False False]],
       fill_value = 1e+20)
In [12] used 0.0156 MiB RAM in 2.02s, peaked 762.94 MiB above current, total RAM usage 3237.92 MiB

In [13]: m.shape
Out[13]: (10000, 10000)
In [13] used 0.0117 MiB RAM in 0.19s, peaked 0.00 MiB above current, total RAM usage 3237.93 MiB

In [14]: m = np.ma.MaskedArray(data=a, mask=np.ma.nomask, dtype=float, copy=False)
In [14] used 0.0039 MiB RAM in 0.18s, peaked 0.00 MiB above current, total RAM usage 3237.94 MiB

In [15]: m
Out[15]: 
masked_array(data =
 [[ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 ..., 
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]],
             mask =
 False,
       fill_value = 1e+20)
In [15] used 0.0000 MiB RAM in 0.18s, peaked 0.00 MiB above current, total RAM usage 3237.94 MiB

saimn added a commit to saimn/numpy that referenced this issue Nov 27, 2015
…dArray.

Ref numpy#3544.

When printing a `MaskedArray`, the whole array is converted to the object dtype,
whereas only a few values are printed to screen. So the approach here is to cut
the array and keep only a subset that it used for the string conversion. This
way the output should not change.
@saimn
Copy link
Contributor

saimn commented Feb 4, 2016

Can be closed I think, since #6748 was merged.

@cpelley
Copy link
Author

cpelley commented Feb 4, 2016

Hi @saimn, thanks for your PR. Testing now.

@cpelley
Copy link
Author

cpelley commented Feb 4, 2016

Confirmed, thanks again @saimn

@cpelley cpelley closed this as completed Feb 4, 2016
jaimefrio pushed a commit to jaimefrio/numpy that referenced this issue Mar 22, 2016
…dArray.

Ref numpy#3544.

When printing a `MaskedArray`, the whole array is converted to the object dtype,
whereas only a few values are printed to screen. So the approach here is to cut
the array and keep only a subset that it used for the string conversion. This
way the output should not change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants