Skip to content

Proposal for simplifying pip dependencies #822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pauldraper opened this issue Sep 3, 2022 · 13 comments
Closed

Proposal for simplifying pip dependencies #822

pauldraper opened this issue Sep 3, 2022 · 13 comments
Labels
Can Close? Will close in 30 days if there is no new activity

Comments

@pauldraper
Copy link

pauldraper commented Sep 3, 2022

Problem

Pip dependencies are implemented awkwardly, causing all sort of caveats.

For example:

  • Pip dependencies by default don't work with bzlmod. You have to vendor requirements.bzl.

  • But vendoring is awkward and requires minimum two bazel runs to work (update lockfile, then re-run repos and copy bzl).

  • And it turns out requirements.bzl doesn't work with annotations anyway (Cannot use annotations with vendored requirements.bzl #821).

  • And you spend an hour figuring all that out because the implementation is so dang overcomplicated.

Probably more too, I just haven't gotten to them yet.

Solution

The core problem is doing stuff in repository rules. You should really do as little as possible in repository rules. Repository rules are for downloading things, and hacky non-hermetic one-offs. It's handicap, and jumping through those hoops hurts everything.

Rather, users should always vendor requirements.bzl.

WORKSPACE.bzl

load(":requirements.bzl", "PIP_PACKAGES")
load("@rules_python//python:pip.bzl", "pip_packages")

pip_packages(PIP_PACKAGES, annotations = {})

BUILD.bazel

# Executable target:
# 1. pip compile a requirements lockfile
# 2. generate the Starlark version
# 3. write the lockfile and bzl to BUILD_WORKSPACE_DIRECTORY
pip_resolve(
  name = "pip",
  requirements_bzl = "requirements.bzl",
  requirements_in = "requirements.in",
  requirements_lock = "requirements.txt",
)

Whenever requirements.in is updated, run bazel run :pip to update files. Easy as that.

As a benefit, annotations are provided directly to pip_packages without shenanigans. (And if there's something advanced not covered by annotations, uses can even consume or modify PIP_PACKAGES itself.)

Simple and robust.

@pauldraper
Copy link
Author

pauldraper commented Sep 3, 2022

Probably more too, I just haven't gotten to them yet.

Update: I don't think it's possible to use a downloaded python interpreter.

Or at least, you can use one, but it'll be platform specific.

@groodt
Copy link
Collaborator

groodt commented Sep 3, 2022

Yes, I agree with you on many points. As a maintainer group we are trying to cleanup the mistakes of the past #807 while also not breaking things too rapidly for existing users.

I too believe the rules should remove most of the eager work happening in repository rules and move a lot closer to rules_jvm_external.

For my own research:

  1. presumably you are a user of pip_parse?
  2. If we provide docs and an option for hermetic, cross-platform installs but only using binary wheels, would you migrate?

@asafflesch
Copy link
Contributor

asafflesch commented Sep 9, 2022

I would like to add to @pauldraper's points - requirements.bzl currently still invokes pip at runtime in order to generate the actual targets from the wheel files. This creates a problem with cross-platform installs in packages like keyring which have conditional dependencies based on sys_platform. Ideally, the file that is vendored should already have the parsed targets generated in order to avoid this, so all we do at runtime is just invoke a bunch of http_archive targets.

An alternate integration of pip into bazel that we're currently using, rules_pip, works in this manner, and apparently the JSON format they used there ended up being remarkably similar to pip install's new installation report functionality.

And @groodt, to chime in on your questions:

  1. In the process of moving over.
  2. Definitely, it's how we work today.

@pauldraper pauldraper reopened this Sep 9, 2022
@pauldraper
Copy link
Author

Thanks for the work on #807 ! I agree that is in the right direction. I suggest further:

  1. Vendor requirements.bzl always. This will be compatible with bzlmod and will avoid running pip compile at load time.
  2. Have requirements.bzl be a simple manifest, which can be turned into repositories by the user. This allows annotations and hermetic cross-platform python to still work with vendoring requirments.bzl.

@FaQA further suggests:

  1. Replace pip install entirely with Starlark downloading and file operations.

@alexeagle
Copy link
Contributor

@pauldraper could you point to more info about why vendoring the requirements.bzl is required for bzlmod to work? It is because the MODULE.bazel file has to repeat the list of dependencies in use_repo calls?

We'll run into the same thing in rules_js https://github.com/aspect-build/rules_js/tree/main/docs#fetch-third-party-packages-from-npm where we still recommend both vendoring and translate-on-the-fly options.

@jvolkman
Copy link
Contributor

Ideally, the file that is vendored should already have the parsed targets generated in order to avoid this, so all we do at runtime is just invoke a bunch of http_archive targets.

This is similar to what I've slowly been working on over at rules_pycross, although I'm piggybacking on Poetry's ability to create cross-platform lock files. I added keyring to example_lock.bzl.

@groodt
Copy link
Collaborator

groodt commented Sep 25, 2022

@FaQA Thanks for your input. Agree that in an ideal world, the full transitive dependency graph for all python runtimes of the user's monorepo would be materialised ahead of time as targets in a .bzl file. As you are aware, python packages do not guarantee static metadata at the moment, so this is a not as easy as we would like. However, as you already are using a "wheel-only" wheelhouse, did you know that you can use https://github.com/bazelbuild/rules_python/blob/main/docs/pip_repository.md#pip_repository-download_only See this for usage instructions #773 (comment) It's not perfect (issues on Windows).

@pauldraper Thanks for your input. As mentioned above, largely agree with your primary points. Just for some further clarifications: pip-tools compile is not used anywhere at load or build time, it's merely a convenience to help users create a requirements.txt. pip install is unused entirely as well. These rules only use pip wheel or pip download to retrieve wheels. Vendoring the .bzl file into the repo currently doesn't gain very much. Although it is recommended. Certainly can look into improving ergonomics or even documenting a small genrule that does similar to your description. The more problematic issue would be if there was a blocker to bzlmod, can you elaborate further?

@pauldraper
Copy link
Author

The more problematic issue would be if there was a blocker to bzlmod, can you elaborate further?

I can't recall the details right now, but IIRC it's basically the fact that most repos operate like

create_repo()
load("@repo//:rules.bzl", "deps")
deps()

but that repo has pip dependencies, it needs a double step.

create_repo()
load("@repo//:rules.bzl", "deps")
deps()
load("@pip_repo//:rules.bzl", "pip_deps")
pip_deps()

I admit I could be wrong about that however.

@asafflesch
Copy link
Contributor

@groodt - Thanks for responding! I am aware of the the download_only option, and in fact do use it. However, it doesn't avoid this issue - I give the example of keyring in my previous post.

If you use keyring on both Linux and OSX, and create repos for Darwin and Linux as detailed in that post, the Darwin package for keyring will be generated with a dependency on secretstorage, which, of course, doesn't otherwise exist in the Darwin repo. You might be able to avoid the issue as long as you're just building py_binary targets, but the moment a genquery target hits anything that depends on keyring, you will have an issue, as both the Linux and Darwin packages will be examined.

@groodt
Copy link
Collaborator

groodt commented Sep 25, 2022

@FaQA Can you please create a repro for the situation you describe? If you're generating platform specific lockfiles, then the situation you describe sounds like a bug. I'm not sure about the genquery part though, genquery has a number of "quirks" so I'm not even sure how well it supports platforms and select.

@asafflesch
Copy link
Contributor

@groodt - I realized I never got back with a reproduction of the issue! Sorry, here it is - https://github.com/Faqa/rules_python_cross_repro.

If you try building the repository there in a Linux env (I've attached a Dockerfile.dazel that you can use in conjunction with the dazel tool to do so quickly if you're on a Macbook), you should immediately see a failure of ERROR: Analysis of target '//:hello_deps' failed; build aborted: no such package '@pip_darwin_secretstorage//': The repository '@pip_darwin_secretstorage' could not be resolved: Repository '@pip_darwin_secretstorage' is not defined

Basically, as you say, genquery has no knowledge of select statements, so it loads all package possibilities. This wouldn't be an issue if the deps for keyring weren't being determined and loaded at runtime.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days.
Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!

@github-actions github-actions bot added the Can Close? Will close in 30 days if there is no new activity label Jun 12, 2023
@github-actions
Copy link

This issue was automatically closed because it went 30 days without a reply since it was labeled "Can Close?"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Can Close? Will close in 30 days if there is no new activity
Projects
None yet
Development

No branches or pull requests

5 participants