Skip to content

MNT Add code scanning workflow #28312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 2, 2024
Merged

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Jan 30, 2024

Folllowing #28151, I started to read a few security-related blog posts, for example https://securitylab.github.com/research/github-actions-untrusted-input/.

This PR would add Code scanning to our repo and would potentially detect if we try to add an insecure workflow (amongst other things).

I set it up on my fork, this shows it does protect against a user that creates a PR to add unsafe workflows like the above Github blog post:
lesteve#31

I created the workflow, clicking on the right buttons on Github. The only tweak I did is to disable code scanning for C/C++. By default C/C++ was enabled (probably because of the svm code), except that it was failing. Probably it is possible to enable it but then you need to give a custom build command. I don't think it is worth it right now.

By the way it looks like one of our js file triggers a Medium security alert (screenshot from my fork setup):
image

@github-advanced-security
Copy link

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

Copy link

github-actions bot commented Jan 30, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: b79a30b. Link to the linter CI: here

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious to see how this evolves. Thanks @lesteve

Comment on lines 5 to 7
branches: [ "main" ]
pull_request:
branches: [ "main" ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if we should also do release branches, or at least the ones we're doing from now on?

# - https://gh.io/supported-runners-and-hardware-resources
# - https://gh.io/using-larger-runners
# Consider using larger runners for possible analysis time improvements.
runs-on: ${{ (matrix.language == 'swift' && 'macos-latest') || 'ubuntu-latest' }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swift? O_o

@lesteve
Copy link
Member Author

lesteve commented Jan 31, 2024

I removed the explicit swift mentions from the automatically generated workflow and also added *.X branches.

I don't really know wheter this is likely to highlight many times the same issue in different .X branches ... Edit: I think this is OK since the workflow triggers only on push or PR.

In case that happens and it's too noisy, we could as you mention start with only 1.4.X and remember to add release branches when we create them.

strategy:
fail-fast: false
matrix:
language: [ 'javascript-typescript', 'python' ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to understand if this line means that the workflow will scan JS and Python code only?

I am somewhat interested about getting reports about our Python code, but also somewhat not excited because I assume it'll lead to false positives and "yes I know you think this is dangerous but we are adults and want to do it".

What I would be very excited about is something that vets/scans/checks the workflows and CI configs. At least I think this is where you could manage to sneak things past us or we just misconfigure things to make life easier for attackers. Do you know if something like this exists? (I think it doesn't :()

Copy link
Member Author

@lesteve lesteve Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't understand the reason why, but I think javascript-typescript actually also checks CI workflows ... see for example the PR in my fork I linked above lesteve#31 where malicious workflows are detected in a PR and also this Github blog post:

The CodeQL workflow scanning queries are (currently) only included in the query suite for JavaScript [...] If the main programming language of your project is something else, such as Python then you need to [...] add JavaScript as an additional language

I agree with you on the Python code, for example I never found LGTM very useful when it was enabled ... and I would admit that my tolerance to false positives is quite low.

The fact that there is no security report on the Python code (tested on my fork) is a good sign for me that there will not be that many false positives. We can always disable python code scanning if we agree it is too much on the noisy side.

Also I haven't played too much with the security report but it looks like you can tag a defect as false positive, hopefully that means that it never shows up again, see this for more details.

@glemaitre
Copy link
Member

By the way it looks like one of our js file triggers a Medium security alert

Whoops, I wrote that code. Always knew that I did not how to code in javascript :)

@adrinjalali
Copy link
Member

@glemaitre @betatim should we merge this then? We can always revert and disable if it gets too noisy.

@betatim betatim merged commit ab476ea into scikit-learn:main Feb 2, 2024
@betatim
Copy link
Member

betatim commented Feb 2, 2024

Merged. Let's see what happens. Having something keep an eye on our workflow configs seems useful

@lesteve lesteve deleted the code-scanning branch February 2, 2024 13:36
@lesteve
Copy link
Member Author

lesteve commented Feb 2, 2024

Great, if you want you can now look at security reports in https://github.com/scikit-learn/scikit-learn/security/code-scanning.

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Feb 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants