Skip to content

feat(pypi/parse_requirements): get dists by version when no hash provied #2695

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 2, 2025

Conversation

Yanpei-Wang
Copy link
Contributor

@Yanpei-Wang Yanpei-Wang commented Mar 24, 2025

This pull request modifies the SimpleAPI HTML parsing to add a new
field where we can get the sha256 values by package version. This
allows us to very easily fallback to all packages of a particular
version when using experimental_index_url if the hashes are not
specified.

The code deciding which packages to query the SimpleAPI for has been
also modified to only omit queries for packages that are included via
direct URL references.

If we fail to get the data from the SimpleAPI, we will fallback to
pip and try to install it via the legacy behaviour.

Fixes #2023
Work towards #260
Work towards #1357
Work towards #2363

…ided

Modify _add_dists to fetch files by version if no sha256 hashes are available,
and add corresponding unit tests.

- Updated _add_dists to check for the presence of hashes before fetching by hash.
- Added an `else` condition that iterates through index URLs to find wheels and source distributions matching the requirement's version when no hashes are provided.
- Added unit tests in //tests/pypi/parse_requirements to verify version-based fetching.
aignas
aignas previously requested changes Mar 25, 2025
Copy link
Collaborator

@aignas aignas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. For this to be more robust and performant, it would be great to add an extra field that is the result of SimpleAPI parsing where we store sha256s by version of the package in the index.

The code is here:
https://github.com/bazel-contrib/rules_python/blob/main/python/private/pypi/parse_simpleapi_html.bzl#L96

@aignas
Copy link
Collaborator

aignas commented Mar 29, 2025

OK, I have updated the implementation based on my own feedback and updated the
docs. Could you please test if it works in your environment?

@aignas aignas dismissed their stale review March 29, 2025 14:01

I have became a co-author of the PR so I'll ask others to review.

@Yanpei-Wang Yanpei-Wang closed this Apr 1, 2025
@Yanpei-Wang Yanpei-Wang reopened this Apr 1, 2025
@Yanpei-Wang
Copy link
Contributor Author

OK, I have updated the implementation based on my own feedback and updated the docs. Could you please test if it works in your environment?

Thank you for your help. It's my first time doing this. I mistakenly closed it but reopened it.

@Yanpei-Wang
Copy link
Contributor Author

OK, I have updated the implementation based on my own feedback and updated the docs. Could you please test if it works in your environment?

Thanks for the updates and feedback! I’m still new to this, so I really appreciate your help. I’ve checked the updated implementation and docs, and I’ve learned a lot. I tested it in my environment, and it works great.

@aignas aignas enabled auto-merge April 2, 2025 14:48
@aignas aignas added this pull request to the merge queue Apr 2, 2025
Merged via the queue into bazel-contrib:main with commit 965dd51 Apr 2, 2025
3 checks passed
aignas added a commit to aignas/rules_python that referenced this pull request Apr 3, 2025
Whilst integrating bazel-contrib#2695 I introduced a regression and here I add a test
for that and fix it. The code that was getting the filename from the URL
was too eager and would break if there was a git ref as noted in the
test.

Before this commit and bazel-contrib#2695 the code was not handling all of the cases
that are tested now either, so I think now we are in a good place. I am
not sure how we should handle the `git_repository` URLs. Maybe having
`http_archive` and `git_repository` usage would be nice, but I am not sure
how we can introduce it at the moment.

Work towards bazel-contrib#2363
github-merge-queue bot pushed a commit that referenced this pull request Apr 5, 2025
Whilst integrating #2695 I introduced a regression and here I add a test
for that and fix it. The code that was getting the filename from the URL
was too eager and would break if there was a git ref as noted in the
test.

Before this commit and #2695 the code was not handling all of the cases
that are tested now either, so I think now we are in a good place. I am
not sure how we should handle the `git_repository` URLs. Maybe having
`http_archive` and `git_repository` usage would be nice, but I am not
sure
how we can introduce it at the moment.

Work towards #2363
@Yanpei-Wang Yanpei-Wang deleted the version branch April 8, 2025 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable using experimental_index_url without having hashes in the lock file
3 participants