-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
BLD Use Cython's shared memoryview utility to reduce wheel size #31151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
BLD Use Cython's shared memoryview utility to reduce wheel size #31151
Conversation
❌ Linting issuesThis PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling You can see the details of the linting issues under the
|
Looks like wheel building works! There is an issue with windows build because there is no 3.13t pandas build: https://github.com/scikit-learn/scikit-learn/actions/runs/14283757544/job/40036367623?pr=31151 (Also failing on We can see that the wheels from this PR is smaller by ~ 25% The wheels from this PR: https://github.com/scikit-learn/scikit-learn/actions/runs/14283757544?pr=31151 |
Thanks for trying this out ! For the sake of precision, you wrote that it fixes #27767 but I think there are still suggestions that we haven't tried yet there that can add even more size reduction (#27767 (comment)). |
Thanks for the PR, the general direction seems fine. Even if we don't require a recent enough version the cython >= 3.1.0 branch will be tested in the scipy-dev build. FYI I have a PR working around the lack of Windows free-threaded wheel for pandas: #31159. There is also some investigation on the pandas side see pandas-dev/pandas#61242 and pandas-dev/pandas#61249 (apparently the likely culprit is cython-dev). |
sklearn/meson.build
Outdated
subdir: 'sklearn', | ||
cython_args: cython_args, | ||
install: true, | ||
install_tag: 'python-runtime', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't fully understood what install tags were used for but we are not using them in scikit-learn for now so I would remove this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The install tags are quite useful from a packaging perspective. You may use them for adding convenience for downstream packagers for reducing the size of the binaries further. Here's an example PR from NumPy: numpy/numpy#26274
For reference: https://docs.scipy.org/doc/scipy/building/redistributable_binaries.html
Additional context: recently, we released https://github.com/pyodide/pyodide-build/releases/v0.30.0, where I added a feature that allows building a package without isolation. I'm now planning to use this support in NumPy/SciPy for Pyodide so that we can have an easier way to unvendor the tests away from the WASM binaries. If scikit-learn can also implement these, it makes the wheels smaller, and that means that the interactive docs use less bandwidth and load a tad faster.
See also: scikit-image/scikit-image#7470
…memory_view_pr' into thomasjpfan/cython_memory_view_pr
sklearn/linear_model/meson.build
Outdated
@@ -22,7 +22,7 @@ foreach name: name_list | |||
# TODO in principle this should go in py.exension_module below. This is | |||
# temporary work-around for dependency issue with .pyx.tp files. For more | |||
# details, see https://github.com/mesonbuild/meson/issues/13212 | |||
depends: [linear_model_cython_tree, utils_cython_tree], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current understanding is that you only addded cython_shared_module
to the dependencies of the custom targets doing pyx.tp
-> pyx
.
Do you know why this is needed only there and not in all py.extension_module
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why exactly, but locally not adding cython_shared_module
to dependencies seems to work fine. @thomasjpfan do you confirm?
It seems like when you do cython --shared=sklearn._cyutility
you don't need sklearn._cyutility
to exist at build time, but probably at import time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for me, not adding cython_shared_module
to dependencies works on CI and locally.
@matusvalo With cython --shared=sklearn._cyutility
is it specify that module for import time and not build time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like when you do cython --shared=sklearn._cyutility you don't need sklearn._cyutility to exist at build time, but probably at import time.
No, it is not build time dependency. Using --shared=sklearn._cyutility
will just cause that sklearn._cyutility
module is cimported during module load. Nothing more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that sklearn._cyutility
is not a build dependency, it does not need to be passed around as depends
or dependencies
I tested this on wheel CI with Cython==3.1.0b1
here: https://github.com/scikit-learn/scikit-learn/actions/runs/14390353392?pr=31151
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
I pushed a commit with [scipy-dev]
just in case, since this will be the build that tests the cython >= 3.1.0
if we merge this/
I guess something to decide is whether we are happy to merge this now or wait for Cython 3.1.0 to be relased.
If we merge this now, the slight downside is that for wheels the change will be picked automatically as soon as cython 3.1.0 is released.
I am fine merging this now, the scipy-dev build will test this in case there are unexpected issues.
One way to know whether cython >= 3.1.0 was used to build scikit-learn is to do python -c 'import sklearn._cyutility'
. If that works, cython >= 3.1.0 was used.
Given that we are really close to a release, I'd rather not merge it right now, but wait for an actual cython release instead. |
Reference Issues/PRs
Towards #27767
What does this implement/fix? Explain your changes.
This PR updates
meson.build
to use Cython 3.1's--generate-shared=
functionality.Any other comments?
When testing this locally on MacOS, I get these sizes for
scikit_learn-1.7.dev0-cp312-cp312-macosx_15_0_arm64.whl
main
: 15.2 MBWe can see that the wheels from this PR is smaller by ~ 25%
The wheels from this PR: https://github.com/scikit-learn/scikit-learn/actions/runs/14283757544?pr=31151
The wheels from on main: https://github.com/scikit-learn/scikit-learn/actions/runs/14278107805