-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: str and repr special methods blow up memory usage #3544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The res array is of object type, 8 bytes/object on my system, so the result would be about 8GB just for the pointers to the objects. Where is the string part of the example? |
I think the complaint is that
but here this takes ~ 9,6 GB of memory :) |
Thank you both for your interest, yes @pv you are describing the issue a lot clearer than I have 😄 The trouble is that I am not able to view my array which itself comfortably fits in memory, especially as @pv has indicated above, results in a textual output which is tiny. I should have said ~2GB of memory usage for the actual data itself (forgot that the mask array is stored as a byte array). |
Following #6732 I also see a huge memory peak when printing a masked array, and the worse is that this memory is then lost, even if the array is dereferenced. In [1]: import ipython_memory_usage.ipython_memory_usage as imu
In [2]: import numpy as np
In [3]: imu.start_watching_memory()
In [3] used 10.5586 MiB RAM in 4.25s, peaked 0.00 MiB above current, total RAM usage 35.71 MiB
In [4]: a = np.zeros((10000, 10000))
In [4] used 0.0312 MiB RAM in 0.11s, peaked 0.00 MiB above current, total RAM usage 35.74 MiB
In [5]: a[:] = 1
In [5] used 763.0000 MiB RAM in 0.21s, peaked 0.00 MiB above current, total RAM usage 798.74 MiB
In [6]: m = np.ma.MaskedArray(data=a, mask=np.ma.nomask, dtype=float, copy=False)
In [6] used 0.0000 MiB RAM in 0.11s, peaked 0.00 MiB above current, total RAM usage 798.74 MiB
In [7]: m
Out[7]:
masked_array(data =
[[ 1. 1. 1. ..., 1. 1. 1.]
[ 1. 1. 1. ..., 1. 1. 1.]
[ 1. 1. 1. ..., 1. 1. 1.]
...,
[ 1. 1. 1. ..., 1. 1. 1.]
[ 1. 1. 1. ..., 1. 1. 1.]
[ 1. 1. 1. ..., 1. 1. 1.]],
mask =
False,
fill_value = 1e+20)
In [7] used 0.0664 MiB RAM in 0.11s, peaked 0.00 MiB above current, total RAM usage 798.81 MiB
In [8]: m = np.ma.MaskedArray(data=a, mask=False, dtype=float, copy=False)
In [8] used 95.4336 MiB RAM in 10.31s, peaked 1525.52 MiB above current, total RAM usage 894.24 MiB
In [11]: m
Out[11]:
masked_array(data =
[[1.0 1.0 1.0 ..., 1.0 1.0 1.0]
[1.0 1.0 1.0 ..., 1.0 1.0 1.0]
[1.0 1.0 1.0 ..., 1.0 1.0 1.0]
...,
[1.0 1.0 1.0 ..., 1.0 1.0 1.0]
[1.0 1.0 1.0 ..., 1.0 1.0 1.0]
[1.0 1.0 1.0 ..., 1.0 1.0 1.0]],
mask =
[[False False False ..., False False False]
[False False False ..., False False False]
[False False False ..., False False False]
...,
[False False False ..., False False False]
[False False False ..., False False False]
[False False False ..., False False False]],
fill_value = 1e+20)
In [11] used 2343.6602 MiB RAM in 2.03s, peaked 762.87 MiB above current, total RAM usage 3237.91 MiB
In [12]: m
Out[12]:
masked_array(data =
[[1.0 1.0 1.0 ..., 1.0 1.0 1.0]
[1.0 1.0 1.0 ..., 1.0 1.0 1.0]
[1.0 1.0 1.0 ..., 1.0 1.0 1.0]
...,
[1.0 1.0 1.0 ..., 1.0 1.0 1.0]
[1.0 1.0 1.0 ..., 1.0 1.0 1.0]
[1.0 1.0 1.0 ..., 1.0 1.0 1.0]],
mask =
[[False False False ..., False False False]
[False False False ..., False False False]
[False False False ..., False False False]
...,
[False False False ..., False False False]
[False False False ..., False False False]
[False False False ..., False False False]],
fill_value = 1e+20)
In [12] used 0.0156 MiB RAM in 2.02s, peaked 762.94 MiB above current, total RAM usage 3237.92 MiB
In [13]: m.shape
Out[13]: (10000, 10000)
In [13] used 0.0117 MiB RAM in 0.19s, peaked 0.00 MiB above current, total RAM usage 3237.93 MiB
In [14]: m = np.ma.MaskedArray(data=a, mask=np.ma.nomask, dtype=float, copy=False)
In [14] used 0.0039 MiB RAM in 0.18s, peaked 0.00 MiB above current, total RAM usage 3237.94 MiB
In [15]: m
Out[15]:
masked_array(data =
[[ 1. 1. 1. ..., 1. 1. 1.]
[ 1. 1. 1. ..., 1. 1. 1.]
[ 1. 1. 1. ..., 1. 1. 1.]
...,
[ 1. 1. 1. ..., 1. 1. 1.]
[ 1. 1. 1. ..., 1. 1. 1.]
[ 1. 1. 1. ..., 1. 1. 1.]],
mask =
False,
fill_value = 1e+20)
In [15] used 0.0000 MiB RAM in 0.18s, peaked 0.00 MiB above current, total RAM usage 3237.94 MiB |
…dArray. Ref numpy#3544. When printing a `MaskedArray`, the whole array is converted to the object dtype, whereas only a few values are printed to screen. So the approach here is to cut the array and keep only a subset that it used for the string conversion. This way the output should not change.
Can be closed I think, since #6748 was merged. |
Hi @saimn, thanks for your PR. Testing now. |
Confirmed, thanks again @saimn |
…dArray. Ref numpy#3544. When printing a `MaskedArray`, the whole array is converted to the object dtype, whereas only a few values are printed to screen. So the approach here is to cut the array and keep only a subset that it used for the string conversion. This way the output should not change.
Calculated memory usage:
(43200_21600._8) + (43200*21600)
8398080000.0 bits --> 0.9776651859283 GB
The text was updated successfully, but these errors were encountered: