Skip to content

DOC Add a 'Cython Best Practices, Conventions and Knowledge' section #25608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Mar 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions doc/developers/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1449,9 +1449,10 @@ make this task easier and faster (in no particular order).
<https://joblib.readthedocs.io/>`_. ``out`` is then an iterable containing
the values returned by ``some_function`` for each call.
- We use `Cython <https://cython.org/>`_ to write fast code. Cython code is
located in ``.pyx`` and ``.pxd`` files. Cython code has a more C-like
flavor: we use pointers, perform manual memory allocation, etc. Having
some minimal experience in C / C++ is pretty much mandatory here.
located in ``.pyx`` and ``.pxd`` files. Cython code has a more C-like flavor:
we use pointers, perform manual memory allocation, etc. Having some minimal
experience in C / C++ is pretty much mandatory here. For more information see
:ref:`cython`.
- Master your tools.

- With such a big project, being efficient with your favorite editor or
Expand Down
143 changes: 143 additions & 0 deletions doc/developers/cython.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
.. _cython:

Cython Best Practices, Conventions and Knowledge
================================================

This documents tips to develop Cython code in scikit-learn.

Tips for developing with Cython in scikit-learn
-----------------------------------------------

Tips to ease development
^^^^^^^^^^^^^^^^^^^^^^^^

* Time spent reading `Cython's documentation <https://cython.readthedocs.io/en/latest/>`_ is not time lost.

* If you intend to use OpenMP: On MacOS, system's distribution of ``clang`` does not implement OpenMP.
You can install the ``compilers`` package available on ``conda-forge`` which comes with an implementation of OpenMP.

* Activating `checks <https://github.com/scikit-learn/scikit-learn/blob/62a017efa047e9581ae7df8bbaa62cf4c0544ee4/sklearn/_build_utils/__init__.py#L68-L87>`_ might help. E.g. for activating boundscheck use:

.. code-block:: bash

export SKLEARN_ENABLE_DEBUG_CYTHON_DIRECTIVES=1

* `Start from scratch in a notebook <https://cython.readthedocs.io/en/latest/src/quickstart/build.html#using-the-jupyter-notebook>`_ to understand how to use Cython and to get feedback on your work quickly.
If you plan to use OpenMP for your implementations in your Jupyter Notebook, do add extra compiler and linkers arguments in the Cython magic.

.. code-block:: python

# For GCC and for clang
%%cython --compile-args=-fopenmp --link-args=-fopenmp
# For Microsoft's compilers
%%cython --compile-args=/openmp --link-args=/openmp

* To debug C code (e.g. a segfault), do use ``gdb`` with:

.. code-block:: bash

gdb --ex r --args python ./entrypoint_to_bug_reproducer.py

* To have access to some value in place to debug in ``cdef (nogil)`` context, use:

.. code-block:: cython

with gil:
print(state_to_print)

* Note that Cython cannot parse f-strings with ``{var=}`` expressions, e.g.

.. code-block:: bash

print(f"{test_val=}")

* scikit-learn codebase has a lot of non-unified (fused) types (re)definitions.
There currently is `ongoing work to simplify and unify that across the codebase
<https://github.com/scikit-learn/scikit-learn/issues/25572>`_.
For now, make sure you understand which concrete types are used ultimately.

* You might find this alias to compile individual Cython extension handy:

.. code-block::

# You might want to add this alias to your shell script config.
alias cythonX="cython -X language_level=3 -X boundscheck=False -X wraparound=False -X initializedcheck=False -X nonecheck=False -X cdivision=True"

# This generates `source.c` as as if you had recompiled scikit-learn entirely.
cythonX --annotate source.pyx

* Using the ``--annotate`` option with this flag allows generating a HTML report of code annotation.
This report indicates interactions with the CPython interpreter on a line-by-line basis.
Interactions with the CPython interpreter must be avoided as much as possible in
the computationally intensive sections of the algorithms.
For more information, please refer to `this section of Cython's tutorial <https://cython.readthedocs.io/en/latest/src/tutorial/cython_tutorial.html#primes>`_

.. code-block::

# This generates a HTML report (`source.html`) for `source.c`.
cythonX --annotate source.pyx

Tips for performance
^^^^^^^^^^^^^^^^^^^^

* Understand the GIL in context for CPython (which problems it solves, what are its limitations)
and get a good understanding of when Cython will be mapped to C code free of interactions with
CPython, when it will not, and when it cannot (e.g. presence of interactions with Python
objects, which include functions). In this regard, `PEP073 <https://peps.python.org/pep-0703/>`_
provides a good overview and context and pathways for removal.

* Make sure you have deactivated `checks <https://github.com/scikit-learn/scikit-learn/blob/62a017efa047e9581ae7df8bbaa62cf4c0544ee4/sklearn/_build_utils/__init__.py#L68-L87>`_.

* Always prefer memoryviews instead over ``cnp.ndarray`` when possible: memoryviews are lightweight.

* Avoid memoryview slicing: memoryview slicing might be costly or misleading in some cases and
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I have an PR that should dramatically improve this. It almost certainly won't (and shouldn't!) make it into Cython 3, but hopefully won't be long after that. Essentially, a lot of the reference counting turns out to be pretty pointless since you're usually incrementing and decrementing exactly the same object so it can be skipped completely.

So hopefully this advice can change sometime in the next year. But not yet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for mentioning this, @da-woods.

we better not use it, even if handling fewer dimensions in some context would be preferable.

* Decorate final classes or methods with ``@final`` (this allows removing virtual tables when needed)

* Inline methods and function when it makes sense

* Make sure your Cython compilation units `use NumPy recent C API <https://github.com/scikit-learn/scikit-learn/blob/62a017efa047e9581ae7df8bbaa62cf4c0544ee4/setup.py#L64-L70>`_.

* In doubt, read the generated C or C++ code if you can: "The fewer C instructions and indirections
for a line of Cython code, the better" is a good rule of thumb.

* ``nogil`` declarations are just hints: when declaring the ``cdef`` functions
as nogil, it means that they can be called without holding the GIL, but it does not release
the GIL when entering them. You have to do that yourself either by passing ``nogil=True`` to
``cython.parallel.prange`` explicitly, or by using an explicit context manager:

.. code-block:: cython

cdef inline void my_func(self) nogil:

# Some logic interacting with CPython, e.g. allocating arrays via NumPy.

with nogil:
# The code here is run as is it were written in C.

return 0

This item is based on `this comment from Stéfan's Benhel <https://github.com/cython/cython/issues/2798#issuecomment-459971828>`_

* Direct calls to BLAS routines are possible via interfaces defined in ``sklearn.utils._cython_blas``.

Using OpenMP
^^^^^^^^^^^^

Since scikit-learn can be built without OpenMP, it's necessary to protect each
direct call to OpenMP.

The `_openmp_helpers` module, available in
`sklearn/utils/_openmp_helpers.pyx <https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/_openmp_helpers.pyx>`_
provides protected versions of the OpenMP routines. To use OpenMP routines, they
must be ``cimported`` from this module and not from the OpenMP library directly:

.. code-block:: cython

from sklearn.utils._openmp_helpers cimport omp_get_max_threads
max_threads = omp_get_max_threads()


The parallel loop, `prange`, is already protected by cython and can be used directly
from `cython.parallel`.
1 change: 1 addition & 0 deletions doc/developers/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Developer's Guide
tips
utilities
performance
cython
advanced_installation
bug_triaging
maintainer
Expand Down
60 changes: 27 additions & 33 deletions doc/developers/performance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -313,8 +313,8 @@ For more details, see the docstrings of the magics, using ``%memit?`` and
``%mprun?``.


Performance tips for the Cython developer
=========================================
Using Cython
============

If profiling of the Python code reveals that the Python interpreter
overhead is larger by one order of magnitude or more than the cost of the
Expand All @@ -325,37 +325,9 @@ standalone function in a ``.pyx`` file, add static type declarations and
then use Cython to generate a C program suitable to be compiled as a
Python extension module.

The official documentation available at http://docs.cython.org/ contains
a tutorial and reference guide for developing such a module. In the
following we will just highlight a couple of tricks that we found
important in practice on the existing cython codebase in the scikit-learn
project.

TODO: html report, type declarations, bound checks, division by zero checks,
memory alignment, direct blas calls...

- https://www.youtube.com/watch?v=gMvkiQ-gOW8
- https://conference.scipy.org/proceedings/SciPy2009/paper_1/
- https://conference.scipy.org/proceedings/SciPy2009/paper_2/

Using OpenMP
------------

Since scikit-learn can be built without OpenMP, it's necessary to protect each
direct call to OpenMP.

The `_openmp_helpers` module, available in
`sklearn/utils/_openmp_helpers.pyx <https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/_openmp_helpers.pyx>`_
provides protected versions of the OpenMP routines. To use OpenMP routines, they
must be cimported from this module and not from the OpenMP library directly::

from sklearn.utils._openmp_helpers cimport omp_get_max_threads
max_threads = omp_get_max_threads()

.. note::

The parallel loop, `prange`, is already protected by cython and can be used directly
from `cython.parallel`.
The `Cython's documentation <http://docs.cython.org/>`_ contains a tutorial and
reference guide for developing such a module.
For more information about developing in Cython for scikit-learn, see :ref:`cython`.


.. _profiling-compiled-extension:
Expand All @@ -376,6 +348,28 @@ Easy profiling without special compilation options use yep:
- https://pypi.org/project/yep/
- https://fa.bianp.net/blog/2011/a-profiler-for-python-extensions

Using a debugger, gdb
---------------------

* It is helpful to use ``gdb`` to debug. In order to do so, one must use
a Python interpreter built with debug support (debug symbols and proper
optimization). To create a new conda environment (which you might need
to deactivate and reactivate after building/installing) with a source-built
CPython interpreter:

.. code-block:: bash

git clone https://github.com/python/cpython.git
conda create -n debug-scikit-dev
conda activate debug-scikit-dev
cd cpython
mkdir debug
cd debug
../configure --prefix=$CONDA_PREFIX --with-pydebug
make EXTRA_CFLAGS='-DPy_DEBUG' -j<num_cores>
make install


Using gprof
-----------

Expand Down