Skip to content

python_repository: Exclude pycache files #907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

illicitonion
Copy link
Contributor

After the first time the python_repository is used, files are created in
__pycache__ directories. These invalidate the configured target for
the python_repository target, and cause spurious rebuilds.

Here's an example of the set of pycache files I've seen in one
repository.

lib/python3.8/distutils/__pycache__/__init__.cpython-38.pyc
lib/python3.8/distutils/__pycache__/archive_util.cpython-38.pyc
lib/python3.8/distutils/__pycache__/cmd.cpython-38.pyc
lib/python3.8/distutils/__pycache__/config.cpython-38.pyc
lib/python3.8/distutils/__pycache__/core.cpython-38.pyc
lib/python3.8/distutils/__pycache__/debug.cpython-38.pyc
lib/python3.8/distutils/__pycache__/dep_util.cpython-38.pyc
lib/python3.8/distutils/__pycache__/dir_util.cpython-38.pyc
lib/python3.8/distutils/__pycache__/dist.cpython-38.pyc
lib/python3.8/distutils/__pycache__/errors.cpython-38.pyc
lib/python3.8/distutils/__pycache__/extension.cpython-38.pyc
lib/python3.8/distutils/__pycache__/fancy_getopt.cpython-38.pyc
lib/python3.8/distutils/__pycache__/file_util.cpython-38.pyc
lib/python3.8/distutils/__pycache__/log.cpython-38.pyc
lib/python3.8/distutils/__pycache__/spawn.cpython-38.pyc
lib/python3.8/distutils/__pycache__/sysconfig.cpython-38.pyc
lib/python3.8/distutils/__pycache__/util.cpython-38.pyc
lib/python3.8/distutils/command/__pycache__/__init__.cpython-38.pyc
lib/python3.8/distutils/command/__pycache__/build.cpython-38.pyc
lib/python3.8/distutils/command/__pycache__/install.cpython-38.pyc
lib/python3.8/email/__pycache__/_header_value_parser.cpython-38.pyc
lib/python3.8/email/__pycache__/contentmanager.cpython-38.pyc
lib/python3.8/email/__pycache__/headerregistry.cpython-38.pyc
lib/python3.8/email/__pycache__/policy.cpython-38.pyc

PR Checklist

Please check if your PR fulfills the following requirements:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

PR Type

What kind of change does this PR introduce?

  • Bugfix
  • Feature (please, look at the "Scope of the project" section in the README.md file)
  • Code style update (formatting, local variables)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Other... Please describe:

What is the current behavior?

If you cquery twice in a row, you get different output, because extra files were created in the repository by the first cquery invocation.

Issue Number: N/A

What is the new behavior?

The consecutive cqueries now produce identical results.

Does this PR introduce a breaking change?

  • Yes
  • No

@alexeagle
Copy link
Contributor

what version of rules_python? I made a recent fix 3f0d62d that's in 0.14.0

@f0rmiga
Copy link
Member

f0rmiga commented Nov 30, 2022

Are you running as root and set ignore_root_user_error?

@illicitonion
Copy link
Contributor Author

what version of rules_python? I made a recent fix 3f0d62d that's in 0.14.0

I have repro'd in 0.15.0 - this is around the python interpreter's distro itself, rather than than when using pip.

Are you running as root and set ignore_root_user_error?

I am. I could make the globbing conditional on ignore_root_user_error being set, if you think that'd be better?

@f0rmiga
Copy link
Member

f0rmiga commented Dec 1, 2022

When you set that attribute, it's assumed that you stumbled across #713. rules_python is not supported running as root. I'm hesitant to start making exceptions as it involves many side effects (and I'm afraid we don't even know all of them).

@f0rmiga f0rmiga closed this Dec 1, 2022
@illicitonion
Copy link
Contributor Author

Apologies - ignore_root_user_error was being set in a patch I didn't see when I started investigating this, so I actually didn't see #713 before coming here.

I'm an infrastructure provider for dozens of teams who are using Bazel, many of whom are for sure running bazel as root inside docker containers - realistically to use hermetic python interpreters we need a patch like this, and we would really like to not have to keep an internal fork of rules_python if we can avoid it...

What's the reasoning behind making the directory readonly? Presumably part of the point is to avoid generating these pyc files, in which case, excluding them from the glob feels like it's working towards the same aim? Are these other reasons to do so too?

I've added a new commit to https://github.com/illicitonion/rules_python/tree/exclude-pycache-in-python-repository which only excludes these files if you're running as root, to minimise the surface area of the change to "people who have already opted in to ignoring reasonable advice", but I do think it may be worth merging this without the only-when-root patch...

@f0rmiga
Copy link
Member

f0rmiga commented Dec 2, 2022

I noticed that people are running as root by accident, not intentionally. Making the pyc read-only prevents the interpreter from updating them while taking advantage of the improved performance. Why isn't it realistic to stop building as root? As I said, we (rules_python maintainers) don't understand all of the side effects that running as root brings. This was only one of the side effects we caught. Speaking more broadly, and outside Bazel, running as root defeats the whole concept of file permissions on UNIX-like systems, so if we want to keep the "correct" part of { Fast, Correct } — Choose two, the least we can do is stop building as root on UNIX systems.

@gibfahn
Copy link
Contributor

gibfahn commented Dec 4, 2022

Why isn't it realistic to stop building as root?

(context: I work on the same team as @illicitonion )

If running in Docker using User Namespaces, then you really want to be running as root inside the container so you end up mapping to a specific user outside the container. This makes e.g. cleanup tasks outside the container more performant. When running inside a single-use docker container as root, where docker is configured with user namespaces, I don't think it's the case that that "running as root defeats the whole concept of file permissions on UNIX-like systems".

We've been running builds (using a variety of build systems) as root for several years now without issue.

@alexeagle alexeagle reopened this Dec 4, 2022
@f0rmiga
Copy link
Member

f0rmiga commented Dec 6, 2022

@gibfahn thank you for the extra information on how you use Docker. Let's schedule a call to discuss this better.

@pjjw
Copy link

pjjw commented Dec 20, 2022

hey, similar situation here, curious if consensus was reached.

@illicitonion
Copy link
Contributor Author

We had a really useful call, and think the underlying issue here is that https://github.com/indygreg/python-build-standalone includes a somewhat arbitrary subset of .pyc files in its toolchains. @f0rmiga is going to try to arrange upstream that a complete set of .pyc files is included in the toolchains, so that new ones don't get re-generated, and we don't need to ignore them.

Copy link
Member

@f0rmiga f0rmiga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@illicitonion could you update this PR? I'll merge as is and create a separate issue to address this upstream.

@f0rmiga f0rmiga mentioned this pull request Jan 23, 2023
@illicitonion illicitonion force-pushed the exclude-pycache-in-python-repository branch 2 times, most recently from b63e665 to 64b0bdf Compare January 24, 2023 11:46
Some of these are missing from the upstream python distributions, and
when they are, if running as root, this causes the files to be generated
the first time their associated .py file is evaluated, causing analysis
and execution cache misses due to extra files matching the repository glob.
@illicitonion illicitonion force-pushed the exclude-pycache-in-python-repository branch from 64b0bdf to 738add7 Compare January 24, 2023 11:53
@illicitonion
Copy link
Contributor Author

@f0rmiga done - thanks!

Wyverald added a commit that referenced this pull request Feb 28, 2025
Currently, by default, rules_python immediately fails when Bazel is run as root. The reasoning behind this involves .pyc files being generated for hermetic toolchains when they're first used, causing cache misses; to work around this, rules_python opts to make the toolchain installation directory read-only, but running Bazel as root would circumvent this. So rules_python actively detects if the current user is root, and hard fails. This check can be disabled by the root module by setting `python.override(ignore_root_user_error=True)`. (See more context in the linked issues/PRs.)

This causes a reverberating effect across the Bazel ecosystem, as rules_python is essentially a dependency of every single Bazel project through protobuf. Effectively, any Bazel project wishing to run as root need to add the override tag above, even if they don't have anything to do with Python at all.

This PR changes the default value of the `ignore_root_user_error` to True instead. Besides, it now unconditionally tries to make the toolchain installation directory read-only, and only outputs a warning if it's detected that the current user is root.

See previous discussions at #713, #749, #907, #1008, #1169, etc.

Fixes #1169.
Wyverald added a commit that referenced this pull request Feb 28, 2025
Currently, by default, rules_python immediately fails when Bazel is run as root. The reasoning behind this involves .pyc files being generated for hermetic toolchains when they're first used, causing cache misses; to work around this, rules_python opts to make the toolchain installation directory read-only, but running Bazel as root would circumvent this. So rules_python actively detects if the current user is root, and hard fails. This check can be disabled by the root module by setting `python.override(ignore_root_user_error=True)`. (See more context in the linked issues/PRs.)

This causes a reverberating effect across the Bazel ecosystem, as rules_python is essentially a dependency of every single Bazel project through protobuf. Effectively, any Bazel project wishing to run as root need to add the override tag above, even if they don't have anything to do with Python at all.

This PR changes the default value of the `ignore_root_user_error` to True instead. Besides, it now unconditionally tries to make the toolchain installation directory read-only, and only outputs a warning if it's detected that the current user is root.

See previous discussions at #713, #749, #907, #1008, #1169, etc.

Fixes #1169.
Wyverald added a commit that referenced this pull request Feb 28, 2025
Currently, by default, rules_python immediately fails when Bazel is run as root. The reasoning behind this involves .pyc files being generated for hermetic toolchains when they're first used, causing cache misses; to work around this, rules_python opts to make the toolchain installation directory read-only, but running Bazel as root would circumvent this. So rules_python actively detects if the current user is root, and hard fails. This check can be disabled by the root module by setting `python.override(ignore_root_user_error=True)`. (See more context in the linked issues/PRs.)

This causes a reverberating effect across the Bazel ecosystem, as rules_python is essentially a dependency of every single Bazel project through protobuf. Effectively, any Bazel project wishing to run as root need to add the override tag above, even if they don't have anything to do with Python at all.

This PR changes the default value of the `ignore_root_user_error` to True instead. Besides, it now unconditionally tries to make the toolchain installation directory read-only, and only outputs a warning if it's detected that the current user is root.

See previous discussions at #713, #749, #907, #1008, #1169, etc.

Fixes #1169.
Wyverald added a commit that referenced this pull request Feb 28, 2025
Currently, by default, rules_python immediately fails when Bazel is run as root. The reasoning behind this involves .pyc files being generated for hermetic toolchains when they're first used, causing cache misses; to work around this, rules_python opts to make the toolchain installation directory read-only, but running Bazel as root would circumvent this. So rules_python actively detects if the current user is root, and hard fails. This check can be disabled by the root module by setting `python.override(ignore_root_user_error=True)`. (See more context in the linked issues/PRs.)

This causes a reverberating effect across the Bazel ecosystem, as rules_python is essentially a dependency of every single Bazel project through protobuf. Effectively, any Bazel project wishing to run as root need to add the override tag above, even if they don't have anything to do with Python at all.

This PR changes the default value of the `ignore_root_user_error` to True instead. Besides, it now unconditionally tries to make the toolchain installation directory read-only, and only outputs a warning if it's detected that the current user is root.

See previous discussions at #713, #749, #907, #1008, #1169, etc.

Fixes #1169.
Wyverald added a commit that referenced this pull request Feb 28, 2025
Currently, by default, rules_python immediately fails when Bazel is run as root. The reasoning behind this involves .pyc files being generated for hermetic toolchains when they're first used, causing cache misses; to work around this, rules_python opts to make the toolchain installation directory read-only, but running Bazel as root would circumvent this. So rules_python actively detects if the current user is root, and hard fails. This check can be disabled by the root module by setting `python.override(ignore_root_user_error=True)`. (See more context in the linked issues/PRs.)

This causes a reverberating effect across the Bazel ecosystem, as rules_python is essentially a dependency of every single Bazel project through protobuf. Effectively, any Bazel project wishing to run as root need to add the override tag above, even if they don't have anything to do with Python at all.

This PR changes the default value of the `ignore_root_user_error` to True instead. Besides, it now unconditionally tries to make the toolchain installation directory read-only, and only outputs a warning if it's detected that the current user is root.

See previous discussions at #713, #749, #907, #1008, #1169, etc.

Fixes #1169.
github-merge-queue bot pushed a commit that referenced this pull request Mar 3, 2025
Currently, by default, rules_python immediately fails when Bazel is run
as root. The reasoning behind this involves .pyc files being generated
for hermetic toolchains when they're first used, causing cache misses;
to work around this, rules_python opts to make the toolchain
installation directory read-only, but running Bazel as root would
circumvent this. So rules_python actively detects if the current user is
root, and hard fails. This check can be disabled by the root module by
setting `python.override(ignore_root_user_error=True)`. (See more
context in the linked issues/PRs.)

This causes a reverberating effect across the Bazel ecosystem, as
rules_python is essentially a dependency of every single Bazel project
through protobuf. Effectively, any Bazel project wishing to run as root
need to add the override tag above, even if they don't have anything to
do with Python at all.

This PR changes the default value of the `ignore_root_user_error` to
True instead. Besides, it now unconditionally tries to make the
toolchain installation directory read-only, and only outputs a warning
if it's detected that the current user is root.

See previous discussions at #713, #749, #907, #1008, #1169, etc.

Fixes #1169.

---------

Co-authored-by: Richard Levasseur <rlevasseur@google.com>
dcode pushed a commit to dcode/rules_python that referenced this pull request Mar 3, 2025
…-contrib#2636)

Currently, by default, rules_python immediately fails when Bazel is run
as root. The reasoning behind this involves .pyc files being generated
for hermetic toolchains when they're first used, causing cache misses;
to work around this, rules_python opts to make the toolchain
installation directory read-only, but running Bazel as root would
circumvent this. So rules_python actively detects if the current user is
root, and hard fails. This check can be disabled by the root module by
setting `python.override(ignore_root_user_error=True)`. (See more
context in the linked issues/PRs.)

This causes a reverberating effect across the Bazel ecosystem, as
rules_python is essentially a dependency of every single Bazel project
through protobuf. Effectively, any Bazel project wishing to run as root
need to add the override tag above, even if they don't have anything to
do with Python at all.

This PR changes the default value of the `ignore_root_user_error` to
True instead. Besides, it now unconditionally tries to make the
toolchain installation directory read-only, and only outputs a warning
if it's detected that the current user is root.

See previous discussions at bazel-contrib#713, bazel-contrib#749, bazel-contrib#907, bazel-contrib#1008, bazel-contrib#1169, etc.

Fixes bazel-contrib#1169.

---------

Co-authored-by: Richard Levasseur <rlevasseur@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants