Skip to content

BLD Use Cython's shared memoryview utility to reduce wheel size #31151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Apr 5, 2025

Reference Issues/PRs

Towards #27767

What does this implement/fix? Explain your changes.

This PR updates meson.build to use Cython 3.1's --generate-shared= functionality.

Any other comments?

When testing this locally on MacOS, I get these sizes for scikit_learn-1.7.dev0-cp312-cp312-macosx_15_0_arm64.whl

  • This PR: 8.6 MB
  • main: 15.2 MB

We can see that the wheels from this PR is smaller by ~ 25%

The wheels from this PR: https://github.com/scikit-learn/scikit-learn/actions/runs/14283757544?pr=31151
The wheels from on main: https://github.com/scikit-learn/scikit-learn/actions/runs/14278107805

Copy link

github-actions bot commented Apr 5, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 74c600e. Link to the linter CI: here

@thomasjpfan thomasjpfan marked this pull request as draft April 5, 2025 16:21
@thomasjpfan
Copy link
Member Author

thomasjpfan commented Apr 5, 2025

Looks like wheel building works!

There is an issue with windows build because there is no 3.13t pandas build: https://github.com/scikit-learn/scikit-learn/actions/runs/14283757544/job/40036367623?pr=31151 (Also failing on main)

We can see that the wheels from this PR is smaller by ~ 25%

The wheels from this PR: https://github.com/scikit-learn/scikit-learn/actions/runs/14283757544?pr=31151
The wheels from on main: https://github.com/scikit-learn/scikit-learn/actions/runs/14278107805

@jeremiedbb
Copy link
Member

Thanks for trying this out !
Let's hope they release 3.1.0 before 1.7 😄

For the sake of precision, you wrote that it fixes #27767 but I think there are still suggestions that we haven't tried yet there that can add even more size reduction (#27767 (comment)).

@lesteve
Copy link
Member

lesteve commented Apr 9, 2025

Thanks for the PR, the general direction seems fine. Even if we don't require a recent enough version the cython >= 3.1.0 branch will be tested in the scipy-dev build.

FYI I have a PR working around the lack of Windows free-threaded wheel for pandas: #31159.

There is also some investigation on the pandas side see pandas-dev/pandas#61242 and pandas-dev/pandas#61249 (apparently the likely culprit is cython-dev).

subdir: 'sklearn',
cython_args: cython_args,
install: true,
install_tag: 'python-runtime',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully understood what install tags were used for but we are not using them in scikit-learn for now so I would remove this line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The install tags are quite useful from a packaging perspective. You may use them for adding convenience for downstream packagers for reducing the size of the binaries further. Here's an example PR from NumPy: numpy/numpy#26274

For reference: https://docs.scipy.org/doc/scipy/building/redistributable_binaries.html

Additional context: recently, we released https://github.com/pyodide/pyodide-build/releases/v0.30.0, where I added a feature that allows building a package without isolation. I'm now planning to use this support in NumPy/SciPy for Pyodide so that we can have an easier way to unvendor the tests away from the WASM binaries. If scikit-learn can also implement these, it makes the wheels smaller, and that means that the interactive docs use less bandwidth and load a tad faster.

See also: scikit-image/scikit-image#7470

@thomasjpfan thomasjpfan marked this pull request as ready for review April 10, 2025 01:36
@@ -22,7 +22,7 @@ foreach name: name_list
# TODO in principle this should go in py.exension_module below. This is
# temporary work-around for dependency issue with .pyx.tp files. For more
# details, see https://github.com/mesonbuild/meson/issues/13212
depends: [linear_model_cython_tree, utils_cython_tree],
Copy link
Member

@lesteve lesteve Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current understanding is that you only addded cython_shared_module to the dependencies of the custom targets doing pyx.tp -> pyx.

Do you know why this is needed only there and not in all py.extension_module?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why exactly, but locally not adding cython_shared_module to dependencies seems to work fine. @thomasjpfan do you confirm?

It seems like when you do cython --shared=sklearn._cyutility you don't need sklearn._cyutility to exist at build time, but probably at import time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for me, not adding cython_shared_module to dependencies works on CI and locally.

@matusvalo With cython --shared=sklearn._cyutility is it specify that module for import time and not build time?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like when you do cython --shared=sklearn._cyutility you don't need sklearn._cyutility to exist at build time, but probably at import time.

No, it is not build time dependency. Using --shared=sklearn._cyutility will just cause that sklearn._cyutility module is cimported during module load. Nothing more.

@thomasjpfan thomasjpfan marked this pull request as draft April 10, 2025 21:09
@thomasjpfan thomasjpfan marked this pull request as ready for review April 11, 2025 01:14
Copy link
Member Author

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that sklearn._cyutility is not a build dependency, it does not need to be passed around as depends or dependencies

I tested this on wheel CI with Cython==3.1.0b1 here: https://github.com/scikit-learn/scikit-learn/actions/runs/14390353392?pr=31151

@lesteve lesteve changed the title Use Cython's shared memoryview utility between extension modules BLS Use Cython's shared memoryview utility to reduce wheel size Apr 14, 2025
@lesteve lesteve changed the title BLS Use Cython's shared memoryview utility to reduce wheel size BLD Use Cython's shared memoryview utility to reduce wheel size Apr 14, 2025
Copy link
Member

@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

I pushed a commit with [scipy-dev] just in case, since this will be the build that tests the cython >= 3.1.0 if we merge this/

I guess something to decide is whether we are happy to merge this now or wait for Cython 3.1.0 to be relased.

If we merge this now, the slight downside is that for wheels the change will be picked automatically as soon as cython 3.1.0 is released.

I am fine merging this now, the scipy-dev build will test this in case there are unexpected issues.

One way to know whether cython >= 3.1.0 was used to build scikit-learn is to do python -c 'import sklearn._cyutility'. If that works, cython >= 3.1.0 was used.

@jeremiedbb
Copy link
Member

Given that we are really close to a release, I'd rather not merge it right now, but wait for an actual cython release instead.

@lesteve
Copy link
Member

lesteve commented May 9, 2025

FYI Cython 3.1 has been released on May 8.

@jeremiedbb
Copy link
Member

Cython 3.1 and scikit-learn 1.7 have been release. Let's now merge this one.

I wanted to solve the conflicts but I'm not sure how this addition interacts with #31212. ping @lesteve who'll know better what to do :)

@lesteve
Copy link
Member

lesteve commented Jul 7, 2025

It doesn't feel like a changelog is needed but feel free to write one if you disagree 😉

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @thomasjpfan @lesteve

@jeremiedbb jeremiedbb merged commit ef82b77 into scikit-learn:main Jul 7, 2025
39 of 40 checks passed
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 15, 2025
@jeremiedbb jeremiedbb mentioned this pull request Jul 15, 2025
13 tasks
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 16, 2025
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 16, 2025
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 16, 2025
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 16, 2025
@cbrnr
Copy link
Contributor

cbrnr commented Jul 21, 2025

Just wanted to mention that this change caused a frozen version (i.e., an app created with PyInstaller) to fail on import. Not a big deal, because it can be fixed with the --collect-all sklearn flag, but still something that people should probably be aware of.

Traceback (most recent call last):
  File "mnelab\__main__.py", line 5, in <module>
    from mnelab import main
  File "PyInstaller\loader\pyimod02_importers.py", line 457, in exec_module
  File "mnelab\__init__.py", line 23, in <module>
    from mnelab.mainwindow import MainWindow
  File "PyInstaller\loader\pyimod02_importers.py", line 457, in exec_module
  File "mnelab\mainwindow.py", line 31, in <module>
    from mnelab.dialogs import *  # noqa: F403
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "PyInstaller\loader\pyimod02_importers.py", line 457, in exec_module
  File "mnelab\dialogs\__init__.py", line 18, in <module>
    from mnelab.dialogs.history import HistoryDialog
  File "PyInstaller\loader\pyimod02_importers.py", line 457, in exec_module
  File "mnelab\dialogs\history.py", line 16, in <module>
    from mnelab.utils import PythonHighlighter
  File "PyInstaller\loader\pyimod02_importers.py", line 457, in exec_module
  File "mnelab\utils\__init__.py", line 5, in <module>
    from mnelab.utils.dependencies import have
  File "PyInstaller\loader\pyimod02_importers.py", line 457, in exec_module
  File "mnelab\utils\dependencies.py", line 19, in <module>
    mod = import_module(module_name)
  File "importlib\__init__.py", line 88, in import_module
  File "PyInstaller\loader\pyimod02_importers.py", line 457, in exec_module
  File "picard\__init__.py", line 23, in <module>
  File "PyInstaller\loader\pyimod02_importers.py", line 457, in exec_module
  File "picard\dropin_sklearn.py", line 10, in <module>
  File "PyInstaller\loader\pyimod02_importers.py", line 457, in exec_module
  File "sklearn\__init__.py", line 73, in <module>
  File "PyInstaller\loader\pyimod02_importers.py", line 457, in exec_module
  File "sklearn\base.py", line 19, in <module>
  File "PyInstaller\loader\pyimod02_importers.py", line 457, in exec_module
  File "sklearn\utils\__init__.py", line 9, in <module>
  File "PyInstaller\loader\pyimod02_importers.py", line 457, in exec_module
  File "sklearn\utils\_chunking.py", line 11, in <module>
  File "PyInstaller\loader\pyimod02_importers.py", line 457, in exec_module
  File "sklearn\utils\_param_validation.py", line 17, in <module>
  File "PyInstaller\loader\pyimod02_importers.py", line 457, in exec_module
  File "sklearn\utils\validation.py", line 24, in <module>
  File "sklearn/utils/_isfinite.pyx", line 1, in init sklearn.utils._isfinite
ModuleNotFoundError: No module named 'sklearn._cyutility'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants