Skip to content

gh-134160: Split extension module init from PyModule docs; emphasize multi-phase init #135126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
246 changes: 246 additions & 0 deletions Doc/c-api/extension-modules.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
.. highlight:: c

.. _extension-modules:

Defining Extension Modules
--------------------------

A C extension for CPython is a shared library (for example, a ``.so`` file
on Linux, ``.pyd`` DLL on Windows), which is loadable into the Python process
(for example, it is compiled with compatible compiler settings), and which
exports an :ref:`initialization function <extension-export-hook>`.

To be importable by default (that is, by
:py:class:`importlib.machinery.ExtensionFileLoader`),
the shared library must be available on :py:attr:`sys.path`,
and must be named after the module name plus an extension listed in
:py:attr:`importlib.machinery.EXTENSION_SUFFIXES`.

.. note::

Building, packaging and distributing extension modules is best done with
third-party tools, and is out of scope of this document.
One suitable tool is ``setuptools``, whose documentation can be found at
https://setuptools.pypa.io/en/latest/setuptools.html.

Normally, the initialization function returns a module definition initialized
using :c:func:`PyModuleDef_Init`.
This allows splitting the creation process into several phases:

- Before any substantial code is executed, Python can determine which
capabilities the module supports, and it can adjust the environment or
refuse loading an incompatible extension.
- By default, Python itself creates the module object -- that is, it does
the equivalent of :py:meth:`object.__new__` for classes.
It also sets initial attributes like :attr:`~module.__package__` and
:attr:`~module.__loader__`.
- Afterwards, the module object is initialized using extension-specific
code -- the equivalent of :py:meth:`~object.__init__` on classes.

This is called *multi-phase initialization* to distinguish it from the legacy
(but still supported) *single-phase initialization* scheme,
where the initialization function returns a fully constructed module.
See the :ref:`single-phase-initialization section below <single-phase-initialization>`
for details.

.. versionchanged:: 3.5

Added support for multi-phase initialization (:pep:`489`).


Multiple module instances
.........................

By default, extension modules are not singletons.
For example, if the :py:attr:`sys.modules` entry is removed and the module
is re-imported, a new module object is created, and typically populated with
fresh method and type objects.
The old module is subject to normal garbage collection.
This mirrors the behavior of pure-Python modules.

Additional module instances may be created in
:ref:`sub-interpreters <sub-interpreter-support>`
or after Python runtime reinitialization
(:c:func:`Py_Finalize` and :c:func:`Py_Initialize`).
In these cases, sharing Python objects between module instances would likely
cause crashes or undefined behavior.

To avoid such issues, each instance of an extension module should
be *isolated*: changes to one instance should not implicitly affect the others,
and all state owned by the module, including references to Python objects,
should be specific to a particular module instance.
See :ref:`isolating-extensions-howto` for more details and a practical guide.

A simpler way to avoid these issues is
:ref:`raising an error on repeated initialization <isolating-extensions-optout>`.

All modules are expected to support
:ref:`sub-interpreters <sub-interpreter-support>`, or otherwise explicitly
signal a lack of support.
This is usually achieved by isolation or blocking repeated initialization,
as above.
A module may also be limited to the main interpreter using
the :c:data:`Py_mod_multiple_interpreters` slot.


.. _extension-export-hook:

Initialization function
.......................

The initialization function defined by an extension module has the
following signature:

.. c:function:: PyObject* PyInit_modulename(void)

Its name should be :samp:`PyInit_{<name>}`, with ``<name>`` replaced by the
name of the module.

For modules with ASCII-only names, the function must instead be named
:samp:`PyInit_{<name>}`, with ``<name>`` replaced by the name of the module.
When using :ref:`multi-phase-initialization`, non-ASCII module names
are allowed. In this case, the initialization function name is
:samp:`PyInitU_{<name>}`, with ``<name>`` encoded using Python's
*punycode* encoding with hyphens replaced by underscores. In Python:

.. code-block:: python

def initfunc_name(name):
try:
suffix = b'_' + name.encode('ascii')
except UnicodeEncodeError:
suffix = b'U_' + name.encode('punycode').replace(b'-', b'_')
return b'PyInit' + suffix

It is recommended to define the initialization function using a helper macro:

.. c:macro:: PyMODINIT_FUNC

Declare an extension module initialization function.
This macro:

* specifies the :c:expr:`PyObject*` return type,
* adds any special linkage declarations required by the platform, and
* for C++, declares the function as ``extern "C"``.

For example, a module called ``spam`` would be defined like this::

static struct PyModuleDef spam_module = {
.m_base = PyModuleDef_HEAD_INIT,
.m_name = "spam",
...
};

PyMODINIT_FUNC
PyInit_spam(void)
{
return PyModuleDef_Init(&spam_module);
}

It is possible to export multiple modules from a single shared library by
defining multiple initialization functions. However, importing them requires
using symbolic links or a custom importer, because by default only the
function corresponding to the filename is found.
See the *"Multiple modules in one library"* section in :pep:`489` for details.

The initialization function is typically the only non-\ ``static``
item defined in the module's C source.


.. _multi-phase-initialization:

Multi-phase initialization
..........................

Normally, the :ref:`initialization function <extension-export-hook>`
(``PyInit_modulename``) returns a :c:type:`PyModuleDef` instance with
non-``NULL`` :c:member:`~PyModuleDef.m_slots`.
Before it is returned, the ``PyModuleDef`` instance must be initialized
using the following function:


.. c:function:: PyObject* PyModuleDef_Init(PyModuleDef *def)

Ensure a module definition is a properly initialized Python object that
correctly reports its type and a reference count.

Return *def* cast to ``PyObject*``, or ``NULL`` if an error occurred.

Calling this function is required for :ref:`multi-phase-initialization`.
It should not be used in other contexts.

Note that Python assumes that ``PyModuleDef`` structures are statically
allocated.
This function may return either a new reference or a borrowed one;
this reference must not be released.

.. versionadded:: 3.5


.. _single-phase-initialization:

Legacy single-phase initialization
..................................

.. attention::
Single-phase initialization is a legacy mechanism to initialize extension
modules, with known drawbacks and design flaws. Extension module authors
are encouraged to use multi-phase initialization instead.

In single-phase initialization, the
:ref:`initialization function <extension-export-hook>` (``PyInit_modulename``)
should create, populate and return a module object.
This is typically done using :c:func:`PyModule_Create` and functions like
:c:func:`PyModule_AddObjectRef`.

Single-phase initialization differs from the :ref:`default <multi-phase-initialization>`
in the following ways:

* Single-phase modules are, or rather *contain*, “singletons”.

When the module is first initialized, Python saves the contents of
the module's ``__dict__`` (that is, typically, the module's functions and
types).

For subsequent imports, Python does not call the initialization function
again.
Instead, it creates a new module object with a new ``__dict__``, and copies
the saved contents to it.
For example, given a single-phase module ``_testsinglephase``
[#testsinglephase]_ that defines a function ``sum`` and an exception class
``error``:

.. code-block:: python

>>> import sys
>>> import _testsinglephase as one
>>> del sys.modules['_testsinglephase']
>>> import _testsinglephase as two
>>> one is two
False
>>> one.__dict__ is two.__dict__
False
>>> one.sum is two.sum
True
>>> one.error is two.error
True

The exact behavior should be considered a CPython implementation detail.

* To work around the fact that ``PyInit_modulename`` does not take a *spec*
argument, some state of the import machinery is saved and applied to the
first suitable module created during the ``PyInit_modulename`` call.
Specifically, when a sub-module is imported, this mechanism prepends the
parent package name to the name of the module.

A single-phase ``PyInit_modulename`` function should create “its” module
object as soon as possible, before any other module objects can be created.

* Non-ASCII module names (``PyInitU_modulename``) are not supported.

* Single-phase modules support module lookup functions like
:c:func:`PyState_FindModule`.

.. [#testsinglephase] ``_testsinglephase`` is an internal module used \
in CPython's self-test suite; your installation may or may not \
include it.
1 change: 1 addition & 0 deletions Doc/c-api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ document the API functions in detail.
veryhigh.rst
refcounting.rst
exceptions.rst
extension-modules.rst
utilities.rst
abstract.rst
concrete.rst
Expand Down
26 changes: 2 additions & 24 deletions Doc/c-api/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -111,33 +111,11 @@ Useful macros
=============

Several useful macros are defined in the Python header files. Many are
defined closer to where they are useful (e.g. :c:macro:`Py_RETURN_NONE`).
defined closer to where they are useful (e.g. :c:macro:`Py_RETURN_NONE`,
:c:macro:`PyMODINIT_FUNC`).
Others of a more general utility are defined here. This is not necessarily a
complete listing.

.. c:macro:: PyMODINIT_FUNC

Declare an extension module ``PyInit`` initialization function. The function
return type is :c:expr:`PyObject*`. The macro declares any special linkage
declarations required by the platform, and for C++ declares the function as
``extern "C"``.

The initialization function must be named :samp:`PyInit_{name}`, where
*name* is the name of the module, and should be the only non-\ ``static``
item defined in the module file. Example::

static struct PyModuleDef spam_module = {
.m_base = PyModuleDef_HEAD_INIT,
.m_name = "spam",
...
};

PyMODINIT_FUNC
PyInit_spam(void)
{
return PyModuleDef_Init(&spam_module);
}


.. c:macro:: Py_ABS(x)

Expand Down
Loading
Loading