You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current mmap implementaton in mmapmodule.c Python defines an 'exports' member to track the number of exported pointers when a user creates an mmap object, and it interferes in the resizing and closing of the map: https://github.com/python/cpython/blob/main/Modules/mmapmodule.c#L110
I'm utilizing memory maps to share memory (numpy arrays) between processes for a scientific computing application, and the memory in reference needs to be resizable. I set up an mmap and then use numpy.frombuffer to treat it as needed (simplified example below, actual code for reference found here https://github.com/DESI-UR/VAST/blob/master/python/vast/voidfinder/_voidfinder_cython_find_next.pyx):
wall_galaxies_coords_fd, WCOORD_BUFFER_PATH=tempfile.mkstemp(prefix="voidfinder", dir=resource_dir, text=False)
num_galaxies=galaxy_coords.shape[0]
w_coord_buffer_length=num_galaxies*3*8# 3 for xyz and 8 for float64os.ftruncate(wall_galaxies_coords_fd, w_coord_buffer_length)
self.points_buffer=mmap.mmap(wall_galaxies_coords_fd, w_coord_buffer_length)
self.points_buffer.write(galaxy_coords.astype(np.float64).tobytes())
delgalaxy_coordsgalaxy_coords=np.frombuffer(self.points_buffer, dtype=np.float64)
galaxy_coords.shape= (num_galaxies, 3)
os.unlink(WCOORD_BUFFER_PATH)
self.points_xyz=galaxy_coords
As part of treating this data as a periodic simulation, I may need to resize my points buffer later, and update my data structures like so:
This code crashes on the self.points_buffer.resize() call with 'BufferError: mmap can't resize with extant buffers exported.' because the self.points_xyz object is referring to the buffer created through the mmap object. Similarly in another section I have errors with 'BufferError: cannot close exported pointers exist.' when I call the close() method of an mmap because of the same issue - another python object pointing to the underlying mmap object.
Core Problem
Both of these cases can be solved with the addition of a single line of code* prior to the call to resize() or close():
n_bytes=self.points_buffer.size()
self.points_xyz=my_dummy_array#extra line of code - the Python pointer in self.points_xyz is redirected away from self.points_buffer via a dummy objectself.points_buffer.resize(n_bytes)
galaxy_coords=np.frombuffer(self.points_buffer, dtype=np.float64)
self.num_points=n_bytes/(3*8)
galaxy_coords.shape= (self.num_points, 3)
self.points_xyz=galaxy_coords
At this point, it seems that the problem is actually Python imposing new mmap requirements (new as in not related to what you would expect to find in 'man mmap' or other POSIX docs) on what the programmer should and should not be allowed to do, given that the bug can be 'fixed' by an otherwise completely unnecessary line of code. Sorry if this waxes a bit philosophical, there is probably a whole debate to be had here about the distinction between process-private and shared resource tracking in Python and whether it can/can't/should/shouldn't/would/wouldn't be good or bad and so forth. I think my concrete motivation however is that the code would work completely fine if the mmap module simply didn't raise an unnecessary buffer error in this case. There is no actual problem with the buffer here. By way of analogy, Python doesn't raise a FileError if I have two handles open to the same regular file and I want to close or modify one of them.
I think possibly someone had the idea to 'help' programmers or extend Python's reference tracking in some way by doing some extra resource tracking here, perhaps related to the duplicate file descriptor issue with this mmap implementation: #78502 A subtle distinction though is that this 'extra tracking' is attempting to track resources which may not even have been created by the current Python interpreter - shared memory may have been created by a completely different Process, for instance.
If this behavior is in some way intentional, at minimum it seems it ought to be thoroughly documented as part of the usage of the mmap object.
*Single line of code, because I have only one python object referencing this memory map. If I had 10 objects referencing it, all 10 objects would need to be redirected to dummy objects or deleted
Related Issues
Edit(s):
It looks like some of this behavior may have been added as a result of this issue: #57166 A possibly better path may have been to do as the user suggested and provide documentation updates instead of the code updates which introduced unusual behavior to mmap. This is now causing issues like mine and these:
Bug report
Bug description:
The current mmap implementaton in mmapmodule.c Python defines an 'exports' member to track the number of exported pointers when a user creates an mmap object, and it interferes in the resizing and closing of the map:
https://github.com/python/cpython/blob/main/Modules/mmapmodule.c#L110
Reproducing example
Actual example
I'm utilizing memory maps to share memory (numpy arrays) between processes for a scientific computing application, and the memory in reference needs to be resizable. I set up an mmap and then use numpy.frombuffer to treat it as needed (simplified example below, actual code for reference found here https://github.com/DESI-UR/VAST/blob/master/python/vast/voidfinder/_voidfinder_cython_find_next.pyx):
As part of treating this data as a periodic simulation, I may need to resize my points buffer later, and update my data structures like so:
This code crashes on the
self.points_buffer.resize()
call with 'BufferError: mmap can't resize with extant buffers exported.' because the self.points_xyz object is referring to the buffer created through the mmap object. Similarly in another section I have errors with 'BufferError: cannot close exported pointers exist.' when I call the close() method of an mmap because of the same issue - another python object pointing to the underlying mmap object.Core Problem
Both of these cases can be solved with the addition of a single line of code* prior to the call to
resize()
orclose()
:At this point, it seems that the problem is actually Python imposing new mmap requirements (new as in not related to what you would expect to find in 'man mmap' or other POSIX docs) on what the programmer should and should not be allowed to do, given that the bug can be 'fixed' by an otherwise completely unnecessary line of code. Sorry if this waxes a bit philosophical, there is probably a whole debate to be had here about the distinction between process-private and shared resource tracking in Python and whether it can/can't/should/shouldn't/would/wouldn't be good or bad and so forth. I think my concrete motivation however is that the code would work completely fine if the mmap module simply didn't raise an unnecessary buffer error in this case. There is no actual problem with the buffer here. By way of analogy, Python doesn't raise a FileError if I have two handles open to the same regular file and I want to close or modify one of them.
I think possibly someone had the idea to 'help' programmers or extend Python's reference tracking in some way by doing some extra resource tracking here, perhaps related to the duplicate file descriptor issue with this mmap implementation: #78502 A subtle distinction though is that this 'extra tracking' is attempting to track resources which may not even have been created by the current Python interpreter - shared memory may have been created by a completely different Process, for instance.
If this behavior is in some way intentional, at minimum it seems it ought to be thoroughly documented as part of the usage of the mmap object.
*Single line of code, because I have only one python object referencing this memory map. If I had 10 objects referencing it, all 10 objects would need to be redirected to dummy objects or deleted
Related Issues
Edit(s):
It looks like some of this behavior may have been added as a result of this issue:
#57166 A possibly better path may have been to do as the user suggested and provide documentation updates instead of the code updates which introduced unusual behavior to mmap. This is now causing issues like mine and these:
#79867
https://stackoverflow.com/questions/53339931/properly-discarding-ctypes-pointers-to-mmap-memory-in-python
ercius/openNCEM#39
https://stackoverflow.com/questions/41077696/python-ctypes-from-buffer-mapping-with-context-manager-into-memory-mapped-file
CPython versions tested on:
3.10
Operating systems tested on:
Linux
The text was updated successfully, but these errors were encountered: