Description
When I index a np.memmap instance with a slice, the slice instance ends up with a too high refcount and will never be collected (numpy 1.13.3, Python 2.7.12, Ubuntu 16.04).
Demonstration (using pympler for collecting all live objects, pip install pympler
, and assuming /etc/fstab
is readable, any file will do):
import sys
import numpy as np
from collections import deque
from pympler import muppy
print(sys.version)
print(np.__version__)
print('Before: %d live slices' %
(len([o for o in muppy.get_objects() if isinstance(o, slice)])))
x = np.empty(100)
deque((x[:3] for _ in range(1000)), maxlen=0)
print('1000 times __getitem__(slice) on ndarray: %d live slices' %
(len([o for o in muppy.get_objects() if isinstance(o, slice)])))
x = np.memmap('/etc/fstab', dtype=np.int8, mode='r')
deque((x[:5] for _ in range(1000)), maxlen=0)
print('1000 times __getitem__(slice) on memmap: %d live slices' %
(len([o for o in muppy.get_objects() if isinstance(o, slice)])))
deque((x[:7,] for _ in range(1000)), maxlen=0)
print('1000 times __getitem__(tuple) on memmap: %d live slices' %
(len([o for o in muppy.get_objects() if isinstance(o, slice)])))
print('some of these slices:')
print([o for o in muppy.get_objects() if isinstance(o, slice)][-3:])
Output:
[GCC 5.4.0 20160609]
1.13.3
Before: 4 live slices
1000 times __getitem__(slice) on ndarray: 4 live slices
1000 times __getitem__(slice) on memmap: 1004 live slices
1000 times __getitem__(tuple) on memmap: 1004 live slices
some of these slices:
[slice(0, 5, None), slice(0, 5, None), slice(0, 5, None)]
So x[:5]
for an np.memmap
leaks a slice(0, 5, None)
, while x[:7,]
does not leak the slice. We can also further inspect the refcount and references:
>>> s = [o for o in muppy.get_objects() if isinstance(o, slice)][-1]
>>> import gc
>>> sys.getrefcount(s)
4
>>> len(gc.get_referrers(s))
2
My colleague @superbock stumbled into this before and said it went away in Python 3; I haven't tested this yet. He also found the workaround of adding a comma (x[:7,]
instead of x[:7]
) to use a different code path.
Note that this leak can become a major problem: I'm using memmaps in my iteration code for training neural networks, and the leak ate up all available memory within 6 hours and had my process killed.