Skip to content

[C API] Revisit usage of the PyCapsule C API with multi-phase initialization API #85964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vstinner opened this issue Sep 16, 2020 · 19 comments
Closed
Labels
3.10 only security fixes topic-C-API

Comments

@vstinner
Copy link
Member

BPO 41798
Nosy @vstinner, @tiran, @encukou, @corona10, @yol, @shihai1991, @erlend-aasland
PRs
  • bpo-41798: Allocate the expat_CAPI on the heap memory. #24061
  • bpo-41798: Allocate the datetime.datetime_CAPI on the heap memory. #24096
  • bpo-41798: Allocate _decimal extension module CAPI on the heap #24117
  • bpo-41798: Allocate socket extension module CAPI on the heap #24126
  • bpo-41798: Allocate unicodedata CAPI on the heap #24128
  • bpo-41798: Allocate the _curses._C_API on the heap memory. #24186
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-01-23.12:49:36.720>
    created_at = <Date 2020-09-16.14:07:49.185>
    labels = ['expert-C-API', '3.10']
    title = '[C API] Revisit usage of the PyCapsule C API with multi-phase initialization API'
    updated_at = <Date 2021-01-25.09:46:36.596>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2021-01-25.09:46:36.596>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-01-23.12:49:36.720>
    closer = 'shihai1991'
    components = ['C API']
    creation = <Date 2020-09-16.14:07:49.185>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 41798
    keywords = ['patch']
    message_count = 19.0
    messages = ['376992', '376995', '379104', '379733', '384281', '384389', '384392', '384393', '384394', '384395', '384429', '384430', '384494', '384537', '384538', '385330', '385486', '385533', '385610']
    nosy_count = 7.0
    nosy_names = ['vstinner', 'christian.heimes', 'petr.viktorin', 'corona10', 'pkerling', 'shihai1991', 'erlendaasland']
    pr_nums = ['24061', '24096', '24117', '24126', '24128', '24186']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue41798'
    versions = ['Python 3.10']

    @vstinner
    Copy link
    Member Author

    More and more extension modules are converted to the multi-phase initialization API (PEP-489) in bpo-1635741.

    The problem is that the usage of the PyCapsule C API is not adapted in these extensions, to isolate well capsule objects from other module instances.

    For example, the pyexpat extension uses "static struct PyExpat_CAPI capi;". It was fine when it was not possible to create more than one instance of the extension module. But with PR 22222 (currently under review), it becomes possible to have multiple extension module instances. Each module instance creates its own capsule object, but all capsule object points to the same unique "struct PyExpat_CAPI" instance.

    For the specific case of the pyexpat in its current implementation, reusing the same "struct PyExpat_CAPI" instance is ok-ish, since the value is the same for all module instances. But this design sounds fragile.

    It would be safer to allocate a "struct PyExpat_CAPI" on the heap memory in each module instance, and use a PyCapsule destructor function (3rd parameter of PyCapsule_New()).

    The _ctypes does that:
    ---

            void *space = PyMem_Calloc(2, sizeof(int));
            if (space == NULL)
                return NULL;
            errobj = PyCapsule_New(space, CTYPES_CAPSULE_NAME_PYMEM, pymem_destructor);

    with:
    ---

    static void pymem_destructor(PyObject *ptr)
    {
        void *p = PyCapsule_GetPointer(ptr, CTYPES_CAPSULE_NAME_PYMEM);
        if (p) {
            PyMem_Free(p);
        }
    }

    The PyCapsule API is used by multiple extension modules:

    • _ctypes: allocate memory on the heap and uses a destructor to release it

    • _curses: static variable, PyInit__curses() sets PyCurses_API[0] to &PyCursesWindow_Type (static type)

    • _datetime: static variable, PyInit__datetime() creates a timezone object and stores it into CAPI.TimeZone_UTC

    • _decimal: static variable

    • _socket: static variable, PyInit__socket() sets PySocketModuleAPI.error to PyExc_OSError, and sets PySocketModuleAPI.timeout_error to _socket.timeout (a new exception object)

    • pyexpat: static varaible

    • unicodedata: static variable

    • posix: nt._add_dll_directory() creates a PyCapsule using AddDllDirectory() result as a the pointer value
      The _datetime module overrides the

    --

    See also the PEP-630 "Isolating Extension Modules".

    @vstinner vstinner added 3.10 only security fixes topic-C-API labels Sep 16, 2020
    @vstinner
    Copy link
    Member Author

    unicodedata: static variable

    Right now, it's ok-ish. Mohamed Koubaa is working on PR 22145 to pass a module state into internal functions, and so the PyCapsule pointer must be different in each module instance.

    @encukou
    Copy link
    Member

    encukou commented Oct 20, 2020

    Also note that the capsule generally needs to hold references to the objects in exposes, and not rely on the module object to keep things alive. (Module objects with multi-phase init are not unique nor immortal.)
    _ctypes is an exception, since it doesn't have ant PyObject*. But in most others the destructor should also contain some DECREFs.

    @vstinner
    Copy link
    Member Author

    Also note that the capsule generally needs to hold references to the objects in exposes

    Oh right, that's a great advice!

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 3, 2021

    New changeset 7c83eaa by Hai Shi in branch 'master':
    bpo-41798: pyexpat: Allocate the expat_CAPI on the heap memory (GH-24061)
    7c83eaa

    @erlend-aasland
    Copy link
    Contributor

    Hai Shi, do you mind if I have a go at the socket module CAPI capsule?

    @shihai1991
    Copy link
    Member

    Hai Shi, do you mind if I have a go at the socket module CAPI capsule?

    Welcome to join this bpo, you can enhance those extension modules if you like :)

    @erlend-aasland
    Copy link
    Contributor

    Welcome to join this bpo, you can enhance those extension modules if you like :)

    Thanks :) I'll take a look at socket first.

    @tiran
    Copy link
    Member

    tiran commented Jan 5, 2021

    PySocketModuleAPI.timeout_error now points to builtin TimeoutError exception.

    I'm also in the process of removing PySocketModuleAPI from _ssl.c.

    @erlend-aasland
    Copy link
    Contributor

    I'm also in the process of removing PySocketModuleAPI from _ssl.c.

    That's the only user of the API, right? In that case, I'll target _decimal instead. Hai Shi is working on _curses and _datetime, AFAIK.

    @erlend-aasland
    Copy link
    Contributor

    I'm also in the process of removing PySocketModuleAPI from _ssl.c.

    BTW, do we still want to keep the socket CAPI for external users?

    @tiran
    Copy link
    Member

    tiran commented Jan 5, 2021

    BTW, do we still want to keep the socket CAPI for external users?

    Yes, at least for a while. socket.h is not part of the public include headers. It's still possible that the CAPI is used by an external project.

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 6, 2021

    New changeset fe9f446 by Erlend Egeberg Aasland in branch 'master':
    bpo-41798: Allocate _decimal extension module C API on the heap (GH-24117)
    fe9f446

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 6, 2021

    New changeset f22b7ca by Erlend Egeberg Aasland in branch 'master':
    bpo-41798: Allocate _socket module C API on the heap (GH-24126)
    f22b7ca

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 6, 2021

    New changeset 1ab0459 by Hai Shi in branch 'master':
    bpo-41798: Allocate the _datetime.datetime_CAPI on the heap memory (GH-24096)
    1ab0459

    @vstinner
    Copy link
    Member Author

    New changeset 61d2639 by Erlend Egeberg Aasland in branch 'master':
    bpo-41798: Allocate unicodedata CAPI on the heap (GH-24128)
    61d2639

    @vstinner
    Copy link
    Member Author

    New changeset 2f12a1b by Hai Shi in branch 'master':
    bpo-41798: Allocate the _curses._C_API on the heap memory (GH-24186)
    2f12a1b

    @shihai1991
    Copy link
    Member

    I have checked all capi instances have been allocated in the heap memory.
    So I think this bpo can be closed ;)

    Thanks Erlend for your contribution.
    Thanks victor, petr for your review&merge.

    @vstinner
    Copy link
    Member Author

    hai shi: "I have checked all capi instances have been allocated in the heap memory. So I think this bpo can be closed ;)"

    I also checks and I confirm that PyCapsule_New() is no longer used with a pointer pointing to static data. Well, there is one last case, in an unit test on the API itself (in _testcapi). This corner case is acceptable ;-)

    Thanks all for the fixes!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes topic-C-API
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants