Skip to content

CI Enable Dependabot and use full length commit SHA for PyPI publishing #29180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 6, 2024

Conversation

matthewfeickert
Copy link
Contributor

Reference Issues/PRs

None that I'm aware of.

What does this implement/fix? Explain your changes.

At the 2024 Scientific Python Developer Summit, work on SPEC 08 (scientific-python/summit-2024#9) includes trying to harden publishing security by pinning GitHub Action workflows to commit hashes.

This PR does three things:

Pinning should be done throughout the publishing workflow, but this PR only implements it for the publishing state to start off and get agreement here first.

Any other comments?

cc @thomasjpfan @betatim given discussion at the summit

Copy link

github-actions bot commented Jun 4, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: fbbd979. Link to the linter CI: here

- "github-actions"
- "dependencies"
reviewers:
- "jeremiedbb"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added @jeremiedbb as a default reviewer as git log -- .github/workflows/ makes it seem like they are responsible for a lot of this already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't think that I've been more active on the maintenance of the workflows than others. I think it's better to add everyone as suggested in #29180 (comment), or maybe no one if possible. We already have bots submitting regular PRs without any pre-assigned reviewer and we don't miss them.

@@ -39,10 +39,13 @@ jobs:
run: |
python build_tools/github/check_wheels.py
- name: Publish package to TestPyPI
uses: pypa/gh-action-pypi-publish@v1.8.5
uses: pypa/gh-action-pypi-publish@81e9d935c883d0b210363ab89cf05f3894778450 # v1.8.14
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewfeickert
Copy link
Contributor Author

This PR is ready for review.

patterns:
- "*"
labels:
- "github-actions"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This label doesn't exist, so I'd drop it.

We could use "Build / CI" though

Copy link
Member

@betatim betatim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a good idea to use hashes for workflows that do critical things like publishing to PyPI, have write permissions to the repository content or that do the wheel building (but I think these aren't GitHub workflows)

Not sure about assigning a reviewer, but going to wait to see what Jeremie says.

@lesteve
Copy link
Member

lesteve commented Jun 5, 2024

In the past, I remember indeed that there were some issues with Dependabot, one of them being that it was running on forks, for example numpy/numpy#18977 but as @matthewfeickert says this has actually gone away since.

Add Dependabot to update GitHub Actions only as a group in a monthly PR so that maintainers never need to update the hashes by themselves

Just to make sure I understand this, that means that:

  • every month there is will be (at most) a single dependabot PR
  • all the actions hashes for all the workflows will be updated in the same PR

More generally, I would be curious about other projects experience with dependabot for updating actions hashes.

One slight worry I have is that the CI in the PR does not actually test that updating the hashes does not break anything. For many workflows, it is actually quite hard to check this in the PR, for example .github/workflows/labeler-title-regex.yml that adds a label based on the title, or .github/workflows/update-lock-files.yml where is something is broken you will realise only the next Monday when the scheduled run happens.

Maybe it is still OK, because by using a monthly update, you kind of hope bugs have been reported and fixed before you update?

Not sure about assigning a reviewer, but going to wait to see what Jeremie says.

Not sure either. I guess it is not possible to assign a team like @scikit-learn/core-devs?

@matthewfeickert matthewfeickert force-pushed the ci/pin-publishing-shas branch from c8f4a54 to 7db9ca4 Compare June 5, 2024 06:08
@matthewfeickert
Copy link
Contributor Author

Just to make sure I understand this, that means that:

  • every month there is will be (at most) a single dependabot PR
  • all the actions hashes for all the workflows will be updated in the same PR

Yes.

More generally, I would be curious about other projects experience with dependabot for updating actions hashes.

It works very well for all of the Scikit-HEP org and Scientific Python in general. c.f. https://learn.scientific-python.org/development/guides/gha-basic/#updating

One slight worry I have is that the CI in the PR does not actually test that updating the hashes does not break anything.

See the PR I linked where it shows the preview of the changes the hash change introduces. If you don't want to update a particular version, you can always tell Dependabot to skip a version until you figure out why your library can't use the latest version of a tool.

I guess it is not possible to assign a team like @scikit-learn/core-devs

This is doable, but is this what you want? If so, that's an easy fix, but I assumed that you would have code owners for different part of the project.

@lesteve
Copy link
Member

lesteve commented Jun 5, 2024

One slight worry I have is that the CI in the PR does not actually test that updating the hashes does not break anything.

See the PR I linked where it shows the preview of the changes the hash change introduces.

Just to make sure I understand, you mean the changelog right, i.e.
image

or something else?

What I was trying to say is that some workflows are not tested by the dependabot PR CI it could introduce subtle breakages, for example "maintainer quality of life" workflows like labeling PR modules based on changed files, weekly scheduled run updating the conda lock-files, etc ...

I guess this may happen rarely in practice but I can image cases where:

  • the dependabot CI is green, so you merge it
  • you realise days or weeks later that it has actually broken one of the workflow. In the update-lock-files.yml case, this is a weekly scheduled run, so maybe you only realise after a few weeks that the automatic PR to update the lock files haven't been created in a while.
  • eventually you figure out that this is due to the dependabot PR, revert it, potentially investigate which action update broke something, figure out how to tell dependabot to stop updating this action until it is fixed, etc ...

I guess it is not possible to assign a team like @scikit-learn/core-devs

This is doable, but is this what you want? If so, that's an easy fix, but I assumed that you would have code owners for different part of the project.

Not sure, happy to hear other scikit-learn developers opinion on this! Personally, I am fine with it being a team responsibility.

@lesteve
Copy link
Member

lesteve commented Jun 5, 2024

Just to be explicit regarding my previous comments, I would be +0.5 for enabling Dependabot, especially if other projects are happy with it, I am mostly trying to imagine/understand the potential issues that can happen.

@ogrisel
Copy link
Member

ogrisel commented Jun 5, 2024

I also have mixed feeling w.r.t. dependabot on github actions workflow for the same reasons Loic expressed, but at the same time, barely never updating the versions of github actions manually is also a problem (e.g. if some older github actions have security vulnerabilities).

* Related to SPEC 08 discussions at the 2024 Scientific Python Developer
  Summit, for publishing actions use the full length commit SHA.
* Update pypa/gh-action-pypi-publish to v1.8.14.
   - c.f. https://github.com/pypa/gh-action-pypi-publish/releases/tag/v1.8.14
* Enable Dependabot for GitHub Action updates only and group updates
  together into a single PR.
* Note that Dependabot is disabled on forks by default.
   - c.f. https://github.blog/changelog/2022-11-07-dependabot-pull-requests-off-by-default-for-forks/
@matthewfeickert matthewfeickert force-pushed the ci/pin-publishing-shas branch from 7db9ca4 to 0f29f46 Compare June 5, 2024 15:36
@matthewfeickert
Copy link
Contributor Author

@lesteve

Just to make sure I understand, you mean the changelog right

Yes, I mean the release notes that you showed.

What I was trying to say is that some workflows are not tested by the dependabot PR CI it could introduce subtle breakages, for example "maintainer quality of life" workflows like labeling PR modules based on changed files, weekly scheduled run updating the conda lock-files, etc.

From git grep scikit-learn currently uses the following actions

actions/cache@v3
actions/checkout@v3
actions/checkout@v4
actions/setup-python@v4
actions/setup-python@v5
actions/upload-artifact@v3
actions/download-artifact@v3
github/codeql-action/init@v3
github/codeql-action/autobuild@v3
github/codeql-action/analyze@v3

pypa/gh-action-pypi-publish@v1.8.5

andymckay/labeler@1.0.4
thomasjpfan/labeler@v2.5.1
peter-evans/create-pull-request@v5
larsoner/circleci-artifacts-redirector-action@master

Note that there is version drift between various workflows and some are fully unpinned. The first block of workflows are GitHub ones that should both be easy to test and very clear what the changes are. The second block is just pypa/gh-action-pypi-publish which is excellently maintained and makes is extremely clear when anything that could break things is changed. The third block are custom user supplied GitHub Actions which are either publicly archived and so will never be updated (https://github.com/andymckay/labeler) — you probably don't want to be using unmaintained code — are maintained by a scikit-learn maintainer and should be easy to verify (https://github.com/thomasjpfan/labeler), or are maintained by Scientific Python and should be easy to verify any changes (https://github.com/scientific-python/circleci-artifacts-redirector-action), leaving only one external workflow that you need to read the release notes carefully on (https://github.com/peter-evans/create-pull-request).

Do you have experience with your labeling workflows being very fragile and easy to break? If not, then this hopefully would require a verify small amount of review effort to verify things before merge.

@lesteve
Copy link
Member

lesteve commented Jun 6, 2024

@matthewfeickert thanks for the details! That's my feeling as well that the vast majority of the workflows we use are widely used and things should break quite rarely when updating their versions.

Just to make sure, once this PR is merged, the fact that dependabot.yml was created will trigger Dependabot and a new Dependabot PR will be created not too long after that?

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@lesteve lesteve changed the title CI Use full length commit SHA for PyPI publishing CI Enable Dependabot and use full length commit SHA for PyPI publishing Jun 6, 2024
@lesteve lesteve merged commit b285de0 into scikit-learn:main Jun 6, 2024
32 checks passed
@lesteve
Copy link
Member

lesteve commented Jun 6, 2024

OK let's merge this and see what happens! We can adapt things based on feed-back, mostly reviewers and labels.

Thanks @matthewfeickert!

@lesteve
Copy link
Member

lesteve commented Jun 6, 2024

Dependabot PR is up at #29203

@matthewfeickert matthewfeickert deleted the ci/pin-publishing-shas branch June 6, 2024 18:08
@jeremiedbb jeremiedbb mentioned this pull request Jul 2, 2024
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants