Skip to content

Exclude static libraries and tests to reduce zipped Python executables #758

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

tetsuok
Copy link
Contributor

@tetsuok tetsuok commented Jul 16, 2022

PR Checklist

Please check if your PR fulfills the following requirements:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

PR Type

What kind of change does this PR introduce?

  • Bugfix
  • Feature (please, look at the "Scope of the project" section in the README.md file)
  • Code style update (formatting, local variables)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Other... Please describe:

What is the current behavior?

Issue Number: N/A

Build artifacts by bazel build with the --build_python_zip option are unnecessary big. The size of the artifacts is at least 90MB (e.g., a zipped python binary which just prints "hello, world"). This is not great when build artifacts are included to docker images. The cause of this size issue is that the hermetic Python toolchain include the static libraries of the Python toolchain and tests for the standard libraries even though majority of users of rules_python don't need them. These files occupy 47MB (52% of an artifact of size 90MB).

Repro steps

WORKSPACE:

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name = "rules_python",
    sha256 = "a3a6e99f497be089f81ec082882e40246bfd435f52f4e82f37e89449b04573f6",
    strip_prefix = "rules_python-0.10.2",
    url = "https://github.com/bazelbuild/rules_python/archive/refs/tags/0.10.2.tar.gz",
)

load("@rules_python//python:repositories.bzl", "python_register_toolchains")

python_register_toolchains(
    name = "python3_10",
    python_version = "3.10.4",
)

BUILD:

load("@rules_python//python:defs.bzl", "py_binary")

py_binary(
    name = "hello",
    srcs = ["hello.py"],
)

hello.py:

print("hello, world!")
$ bazel build --build_python_zip //:hello
$ ls -lh bazel-bin/hello.zip
-r-xr-xr-x 1 t docker 91M Jul 17 11:55 bazel-bin/hello.zip

You can see some static libraries are included in the zip file (we have duplicated libpython3.10.a. The file size (uncompressed) of the static library is 38MB):

$ unzip -l bazel-bin/hello.zip | sort -n | grep ".a$"
     2356  2010-01-01 00:00   runfiles/python3_10_x86_64-unknown-linux-gnu/lib/itcl4.2.2/libitclstub4.2.2.a
   149348  2010-01-01 00:00   runfiles/python3_10_x86_64-unknown-linux-gnu/lib/thread2.8.7/libthread2.8.7.a
   496010  2010-01-01 00:00   runfiles/python3_10_x86_64-unknown-linux-gnu/lib/itcl4.2.2/libitcl4.2.2.a
   526406  2010-01-01 00:00   runfiles/python3_10_x86_64-unknown-linux-gnu/lib/Tix8.4.3/libTix8.4.3.a
 39633662  2010-01-01 00:00   runfiles/python3_10_x86_64-unknown-linux-gnu/lib/libpython3.10.a
 39633662  2010-01-01 00:00   runfiles/python3_10_x86_64-unknown-linux-gnu/lib/python3.10/config-3.10-x86_64-linux-gnu/libpython3.10.a

What is the new behavior?

The static libraries and tests for the standard libraries in the Python toolchain are excluded from Python runfiles built by Python rules. Build artifacts by bazel build with the --build_python_zip get smaller. The size of the artifacts is at least 43MB, saving 47MB.

Does this PR introduce a breaking change?

  • Yes
  • No

Other information

Copy link

@thundergolfer thundergolfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks safe to me. As far as I can tell, these .a files are only being included in distributions to be re-consumed by other build processes and can't be useful in a Bazel Python runtime.


There's also a bunch of .tcl files that may be useless...

Co-authored-by: Jonathon Belotti <jonathon@canva.com>
@tetsuok
Copy link
Contributor Author

tetsuok commented Jul 19, 2022

There's also a bunch of .tcl files that may be useless...

Yes, that can be removed, but removing modules seems arguable to me. We might want to consider removing libpython{python_version}.so (libpython3.10.so for Python 3.10) first because the --build_python_zip option includes the two identical shared libraries, libpython{python_version}.so and libpython{python_version}.so.1.0 into zip files and because the shared libraries dominate the total size of zip files:

$ unzip -l bazel-bin/hello.zip | sort -n | tail -n 6
   745004  2010-01-01 00:00   runfiles/python3_10_x86_64-unknown-linux-gnu/lib/python3.10/pydoc_data/topics.py
   816725  2010-01-01 00:00   runfiles/python3_10_x86_64-unknown-linux-gnu/lib/python3.10/ensurepip/_bundled/setuptools-58.1.0-py3-none-any.whl
  2123599  2010-01-01 00:00   runfiles/python3_10_x86_64-unknown-linux-gnu/lib/python3.10/ensurepip/_bundled/pip-22.0.4-py3-none-any.whl
 38861696  2010-01-01 00:00   runfiles/python3_10_x86_64-unknown-linux-gnu/lib/libpython3.10.so
 38861696  2010-01-01 00:00   runfiles/python3_10_x86_64-unknown-linux-gnu/lib/libpython3.10.so.1.0
110513272                     2381 files

The zip file should include either shared library. libpython{python_version}.so.1.0 is enough since it is referred in //python:libpython. Removing one of the shared libraries saves another 16 MB (43 MB → 27 MB).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants