Skip to content

gh-44098: Release the GIL while calling mmap() in the mmap module on Windows #14114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ZackerySpytz
Copy link
Contributor

@ZackerySpytz ZackerySpytz commented Jun 15, 2019

It is not safe to release the GIL for other relevant syscalls (for example, ftruncate()
or mremap()) in the mmap module -- segfaults could occur if the mmap
object is accessed from multiple threads.

https://bugs.python.org/issue1572968

@eryksun
Copy link
Contributor

eryksun commented Jun 15, 2019

For Windows we should ensure that the GIL is released for all calls to CreateFileMapping / MapViewOfFile, UnmapViewOfFile, SetFilePointer / SetEndOfFile, and CloseHandle (for the section and file handle). The CloseHandle calls can actually take as long as CreateFileMapping, MapViewOfFile, and UnmapViewOfFile. But to put this into perspective, a CreateFile call takes about 10 times longer. Note that SetEndOfFile can be on the order of a CreateFile call in terms of I/O system-call overhead and updating filesystem records for allocated disk space, but it doesn't actually zero the extended region on disk, which would be far more expensive. Zeroing occurs on access when the valid data length is extended.


Another issue I notice with this module is that tagname is a char * that we're encoding as UTF-8 and passing to ANSI CreateFileMappingA. This creates mojibake for non-ASCII strings on all but the latest Windows 10 systems that have the system locale configure to use UTF-8. tagname should be a wchar_t *, and we should be calling CreateFileMappingW.

@ZackerySpytz
Copy link
Contributor Author

Thank you, Eryk Sun.

If it is possible for access violations to occur when the mmap object is, for example, accessed by multiple threads, the GIL should not be released.

For what it's worth, I did encounter an access violation when releasing the GIL in mmap.resize() at some point (when using threads). It reminded me of https://bugs.python.org/issue16212, though that bug was encountered on Linux.

I will create a BPO issue for the mojibake issue.

@eryksun
Copy link
Contributor

eryksun commented Jun 17, 2019

Right. Releasing the global lock in mmap_resize_method would require redesigning the mmap module to synchronize access with a local lock, similar to what "_io/bufferedio.c" has to do to protect its internal state. resize isn't called frequently enough to justify the potential for introducing new bugs here.

I did a little timing test in C to gauge the relative time taken by the calls needed to create a file, resize it to 128 MiB, get the size, create a section for it, map a view, unmap the view, and close the section and file handles. The following is the result for 1000 trials, keeping only values within a standard deviation for each call, and normalizing the averages against that of CreateFileMapping:

Function Average Time
CreateFile 14.90
SetEndOfFile 4.34
GetFileSize 0.19
CreateFileMapping 1.00
MapViewOfFile 0.87
UnmapViewOfFile 0.95
CloseHandle, Section 0.42
CloseHandle, File 3.72

In new_mmap_object, we're not releasing the GIL for GetFileSize, but it's a relatively inexpensive call. But the cost of MapViewOfFile is on the order of CreateFileMapping. The two should be viewed as a unit. For example, given m_obj->data is initially NULL:

Py_BEGIN_ALLOW_THREADS
m_obj->map_handle = CreateFileMapping(m_obj->file_handle,
                                      NULL,
                                      flProtect,
                                      size_hi,
                                      size_lo,
                                      m_obj->tagname);
if (m_obj->map_handle == NULL) {
    dwErr = GetLastError();
} else {
    m_obj->data = (char *) MapViewOfFile(m_obj->map_handle,
                                         dwDesiredAccess,
                                         off_hi,
                                         off_lo,
                                         m_obj->size);
    if (m_obj->data == NULL) {
        dwErr = GetLastError();
        CloseHandle(m_obj->map_handle);
        m_obj->map_handle = NULL;
    }
}
Py_END_ALLOW_THREADS
if (m_obj->data == NULL) {
    Py_DECREF(m_obj);
    PyErr_SetFromWindowsErr(dwErr);
    return NULL;
}
return (PyObject *)m_obj;

@eryksun
Copy link
Contributor

eryksun commented Jun 17, 2019

In new_mmap_object, we're not releasing the GIL for GetFileSize, but it's a relatively inexpensive call.

On second thought, the file could be remote, in which case the underlying NtQueryInformationFile: FileStandardInformation call is subject to network latency. We should release the GIL when calling GetFileSize in mmap_size_method and new_mmap_object.


This is unrelated, but a possible bug. For Unix, shouldn't mmap_size_method return PyLong_FromSsize_t(self->size) if self->fd is -1, like the Windows implementation? Currently in Linux size() of an anonymous mapping raises OSError: [Errno 9] Bad file descriptor.

@brettcannon brettcannon added the type-bug An unexpected behavior, bug, or error label Jun 21, 2019
@ambv
Copy link
Contributor

ambv commented Aug 27, 2021

Hey, @isidentical @ZackerySpytz @eryksun: this has been authored and accepted two years back. If this is still relevant, can you rebase on main?

@netlify
Copy link

netlify bot commented Dec 12, 2022

Deploy Preview for python-cpython-preview ready!

Name Link
🔨 Latest commit c196466
🔍 Latest deploy log https://app.netlify.com/sites/python-cpython-preview/deploys/6396a25564a80c0009885b80
😎 Deploy Preview https://deploy-preview-14114--python-cpython-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@erlend-aasland erlend-aasland changed the title bpo-1572968: Release the GIL while calling mmap() in the mmap module gh-44098: Release the GIL while calling mmap() in the mmap module on Windows Jan 5, 2024
m_obj->map_handle = CreateFileMappingA(m_obj->file_handle,
NULL,
flProtect,
size_hi,
size_lo,
m_obj->tagname);
Py_END_ALLOW_THREADS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be after the MapViewOfFile call. We can probably restructure the function to store results in locals and then determine success/failure after we have the GIL back

@python-cla-bot
Copy link

The following commit authors need to sign the Contributor License Agreement:

CLA signed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review OS-windows type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.