-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
BLD Reduces size of wheels by stripping symbols #25123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jeremiedbb @ogrisel This resolves the large wheel size problem on Linux. I think it would be nice to have for 1.2 |
From this PR, the total size of the artifacts zip is 142 MB. The latest wheel build in main was 231 MB. When I unzip the artifacts and look at the wheels, I see that the manylinux wheels are around the same size as Windows and macOS. On |
I think it's fine to disable all debug informations for the wheels but I'd like to have an easy way to re-enable them when building in my dev env. Maybe we can have an environnement variable ? |
I updated the PR to support python setup.py build_ext -i --debug Are you okay with this as in a dev environment?
If the above is not agreeable, then I'm okay with adding a environment variable, such as |
Let's use the |
About the change itself I wonder if it won't impact profilers with native code profiling such as py-spy with the --native flag on linux. |
Coming from py-spy's README, stripping symbols will not give the best output. If a developer is using Overall, I think it is still better to strip the symbols for the wheels on PyPI because:
I updated this PR, introducing a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the change, but we should make sure the env variables we have around the codebase are discoverable.
`SKLEARN_BUILD_ENABLE_DEBUG_SYMBOLS` | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
When this environment variable is set to a non zero value, the debug symbols | ||
will be included in the compiled C extensions. Only debug symbols for POSIX | ||
systems is configured. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't really belong to parallelism.rst
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this file is named parallelism.rst
, the title is "Parallelism, resource management, and configuration". I'm guessing all the environment variables listed here fall under "configuration", which this PR adds onto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for discoverability, searching for "configuration" places the parallelism.rst
page as second. I'm open to figuring out a better location for all the configuration options, but I do not think it is a blocker for this PR.
This would be a significant improvement on the inference endpoints I have which create a new mamba environment every time, so it makes me happy :D EDIT: oh no, this doesn't affect conda-forge packages lol. Silly me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked the --debug
option. @ogrisel was your remark about the env var just about the name of such a var or that you prefer the env var over this cmd line option ?
if build_with_debug_symbols: | ||
default_extra_compile_args.append("-g") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it not already set by default ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On macOS+clang and Windows+MSCV, the debug symbols are stripped by default, which resulted in the ~9MB wheels on PyPI. With the new environment variable, I can enable debug symbols without changing the source on my local macOS machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as a first iteration to integrate as part of 1.2
.
Further improvements can be:
- restructuring information in
parallelism.rst
which is not about parallelism anymore - using binary stripper, like
strip(1)
to further reduce the size of wheel-shipped shared objects
Edit: I let @jeremiedbb merge if this LGTH.
I just wonder why we ruled out the |
I second that. I have a slight preference for it since it sounds more standard, but no strong opinion. |
With What makes |
Thanks for the clarification. Let's merge then :) |
Reference Issues/PRs
Follow up to #25063
What does this implement/fix? Explain your changes.
This PR sets
-g0
to strip symbols and reduce the file size of the linux wheels.