Skip to content

MAINT Protect CUDA GPU workflow against timed commits #29286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 18, 2024

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented Jun 18, 2024

This should prevent contributors to push malicious changes in their PR right after a maintainer has manually triggered on their PR.

I guess the only way to test that this works as expected to is to merge to main an trigger test build, although I could not find an exhaustive list of what triggers updates of the value of github.event.repository.updated_at.

@ogrisel ogrisel requested review from thomasjpfan and betatim June 18, 2024 08:41
Copy link

github-actions bot commented Jun 18, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 8380268. Link to the linter CI: here

@adrinjalali
Copy link
Member

Isn't giving a git hash not safer? This implementation seems a bit brittle to me somehow.

@betatim
Copy link
Member

betatim commented Jun 18, 2024

Yes using a hash would be safer. However, it is also more tedious to get the hash from the web UI so you can paste it. So there is a trade off to be made.

@ogrisel
Copy link
Member Author

ogrisel commented Jun 18, 2024

I am fine with either option. Copy/pasting the commit hash is not that tedious.

The problem is that once we have the git hash it's hard to find which PR holds that hash. So when looking at the list of past runs at https://github.com/scikit-learn/scikit-learn/actions/workflows/cuda-gpu-ci.yml would be hard to find the PR that was tested from the logs.

EDIT: actually it should be quite doable:

https://stackoverflow.com/a/48085242/163740

Updating my preference to use a commit hash.

@lesteve
Copy link
Member

lesteve commented Jun 18, 2024

Favoring the commit hash too, as I mentioned in the emails (probably too long 😅).

Maybe, rather than the PR number, we can use a commit link copied from inside the PR in the trigger workflow dialog box. It seems convenient enough for us maintainers to copy and paste from inside the PR for example here:
image

and the URL seems to have both the PR and the commit:

https://github.com/scikit-learn/scikit-learn/pull/29286/commits/cf16f309ac8a8d0e477b087d9691c8e9e27be23a

In the workflow you can parse the PR number and the commit from the URL and you are good to go, only need to add the right git checkout or git reset --hard command.

The fact that Github issues a warning when I am composing this message may indicate that there are subtleties see https://github.com/refined-github/refined-github/wiki/GitHub-markdown-linkifier-bug:
image

Edit: having read more it looks more like a rendering thing so I don't think it affects the GPU CI workflow.

@ogrisel
Copy link
Member Author

ogrisel commented Jun 18, 2024

I updated the PR to use a commit hash. WDYT @betatim?

@ogrisel ogrisel closed this Jun 18, 2024
@ogrisel ogrisel reopened this Jun 18, 2024
@ogrisel
Copy link
Member Author

ogrisel commented Jun 18, 2024

I mis-clicked... did not intend to close.

@betatim
Copy link
Member

betatim commented Jun 18, 2024

Commit hash is also fine for me (or should we go all in and implement the label based solution in this PR already? It seems we have boarded the "let's fix this properly" train already 😄.

We also need to update build_tools/cuda_ci_status.py as that takes the PR number from its environment (it is part of #29245). The question is: should we take the first PR number found or is there a way to comment on all PRs (can we store all of them in a comma separated env var?)

Or do we not worry for now as the PR hasn't been merged and we could wait for the label solution to be done (in which case fishing the PR number out via git magic isn't needed anymore)?

@ogrisel
Copy link
Member Author

ogrisel commented Jun 18, 2024

I am not sure I will have time to iterate over this PR today or tomorrow. So I would rather be in favor of merging as is and do the label based solution directly in #29245 to address the need for the PR number.

@lesteve
Copy link
Member

lesteve commented Jun 18, 2024

@ogrisel Has it been tested in some fashion or should we merge only by inspecting the diff?

@ogrisel
Copy link
Member Author

ogrisel commented Jun 18, 2024

No I did not test on a fork. It would require editing the workflow to remove the run-on clause. I can do it if you think it's really needed.

@ogrisel
Copy link
Member Author

ogrisel commented Jun 18, 2024

Indeed the current PR is unlikely to find the commit with such a naive checkout:

$ git clone https://github.com/scikit-learn/scikit-learn
Cloning into 'scikit-learn'...
remote: Enumerating objects: 246603, done.
remote: Counting objects: 100% (246603/246603), done.
remote: Compressing objects: 100% (54521/54521), done.
remote: Total 246603 (delta 191914), reused 244993 (delta 191040), pack-reused 0
Receiving objects: 100% (246603/246603), 155.00 MiB | 4.92 MiB/s, done.
Resolving deltas: 100% (191914/191914), done.
$ cd scikit-learn
$ git checkout 8380268ebb7d9952e7787a6b0cf3893b955c9c20
fatal: reference is not a tree: 8380268ebb7d9952e7787a6b0cf3893b955c9c20

@lesteve
Copy link
Member

lesteve commented Jun 18, 2024

What's our feeling about using a URL like https://github.com/scikit-learn/scikit-learn/pull/29286/commits/cf16f309ac8a8d0e477b087d9691c8e9e27be23a as mentioned in #29286 (comment)?

It seems OKish to copy-and-paste into the dialog box and contains the PR number + the commit.

update-lock-files.yml may need to be adapted a bit but I think this is doable.

@ogrisel
Copy link
Member Author

ogrisel commented Jun 18, 2024

What's our feeling about using a URL like cf16f30 as mentioned in #29286 (comment)?

I find it a bit ugly.

Let me check if the current workflow does not work on my fork first because the following works:

$ git clone https://github.com/scikit-learn/scikit-learn
$ git fetch origin 8380268ebb7d9952e7787a6b0cf3893b955c9c20
$ git checkout 8380268ebb7d9952e7787a6b0cf3893b955c9c20

@ogrisel
Copy link
Member Author

ogrisel commented Jun 18, 2024

I confirm that the current commit_hash checkout used in this PR works on my fork:

https://github.com/ogrisel/scikit-learn/actions/runs/9566401522/job/26371606982

@ogrisel
Copy link
Member Author

ogrisel commented Jun 18, 2024

Actually, to really test it I would need someone else to open a PR with an empty commit from your own fork pointing to ogrisel/scikit-learn:main to replicate the actual usage scenario.

@ogrisel
Copy link
Member Author

ogrisel commented Jun 18, 2024

It works when dispatching on a commit from another fork's PR:

https://github.com/ogrisel/scikit-learn/actions/runs/9566663488/job/26372488765

Copy link
Member

@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's merge this one, thanks!

@lesteve lesteve merged commit e9b4856 into scikit-learn:main Jun 18, 2024
39 checks passed
@ogrisel
Copy link
Member Author

ogrisel commented Jun 18, 2024

For the record, I am fine with the current state of this PR but I am not opposed to using the full PR+commit URL @lesteve suggested if others prefer it. But some parsing will be needed.

An alternative follow-up would be to completely replace this workflow by a label event PR-scoped workflow which would be inherently secure to cache poisoning attacks on cache contents associated with the main branch.

@ogrisel ogrisel deleted the gpu-ci-timed-commit-protection branch June 18, 2024 14:19
@lesteve
Copy link
Member

lesteve commented Jun 18, 2024

For completeness, the reason I proposed the PR commit URL is that I viewed it a slight usability improvement but I am not will to die on this hill 😉

For example from the diff view tab I review the change in the Files view, I click on commits, I take the last one, right-click copy and paste it in the workflow input box:
image

Copy and pasting a commit I need to be precise with the mouse or go to the commits tab.

@jeremiedbb jeremiedbb mentioned this pull request Jul 2, 2024
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants