MNT Add code scanning workflow #28312

lesteve · 2024-01-30T15:19:02Z

Folllowing #28151, I started to read a few security-related blog posts, for example https://securitylab.github.com/research/github-actions-untrusted-input/.

This PR would add Code scanning to our repo and would potentially detect if we try to add an insecure workflow (amongst other things).

I set it up on my fork, this shows it does protect against a user that creates a PR to add unsafe workflows like the above Github blog post:
lesteve#31

I created the workflow, clicking on the right buttons on Github. The only tweak I did is to disable code scanning for C/C++. By default C/C++ was enabled (probably because of the svm code), except that it was failing. Probably it is possible to enable it but then you need to give a custom build command. I don't think it is worth it right now.

By the way it looks like one of our js file triggers a Medium security alert (screenshot from my fork setup):

github-advanced-security · 2024-01-30T15:20:08Z

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

github-actions · 2024-01-30T15:20:20Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: b79a30b. Link to the linter CI: here}

adrinjalali

Curious to see how this evolves. Thanks @lesteve

adrinjalali · 2024-01-30T18:00:48Z

.github/workflows/codeql.yml

+    branches: [ "main" ]
+  pull_request:
+    branches: [ "main" ]


wondering if we should also do release branches, or at least the ones we're doing from now on?

adrinjalali · 2024-01-30T18:01:39Z

.github/workflows/codeql.yml

+    #   - https://gh.io/supported-runners-and-hardware-resources
+    #   - https://gh.io/using-larger-runners
+    # Consider using larger runners for possible analysis time improvements.
+    runs-on: ${{ (matrix.language == 'swift' && 'macos-latest') || 'ubuntu-latest' }}


lesteve · 2024-01-31T05:40:13Z

I removed the explicit swift mentions from the automatically generated workflow and also added *.X branches.

I don't really know wheter this is likely to highlight many times the same issue in different .X branches ... Edit: I think this is OK since the workflow triggers only on push or PR.

In case that happens and it's too noisy, we could as you mention start with only 1.4.X and remember to add release branches when we create them.

betatim · 2024-01-31T08:20:15Z

.github/workflows/codeql.yml

+    strategy:
+      fail-fast: false
+      matrix:
+        language: [ 'javascript-typescript', 'python' ]


I was trying to understand if this line means that the workflow will scan JS and Python code only?

I am somewhat interested about getting reports about our Python code, but also somewhat not excited because I assume it'll lead to false positives and "yes I know you think this is dangerous but we are adults and want to do it".

What I would be very excited about is something that vets/scans/checks the workflows and CI configs. At least I think this is where you could manage to sneak things past us or we just misconfigure things to make life easier for attackers. Do you know if something like this exists? (I think it doesn't :()

So I don't understand the reason why, but I think javascript-typescript actually also checks CI workflows ... see for example the PR in my fork I linked above lesteve#31 where malicious workflows are detected in a PR and also this Github blog post:

The CodeQL workflow scanning queries are (currently) only included in the query suite for JavaScript [...] If the main programming language of your project is something else, such as Python then you need to [...] add JavaScript as an additional language

I agree with you on the Python code, for example I never found LGTM very useful when it was enabled ... and I would admit that my tolerance to false positives is quite low.

The fact that there is no security report on the Python code (tested on my fork) is a good sign for me that there will not be that many false positives. We can always disable python code scanning if we agree it is too much on the noisy side.

Also I haven't played too much with the security report but it looks like you can tag a defect as false positive, hopefully that means that it never shows up again, see this for more details.

glemaitre · 2024-01-31T10:31:35Z

By the way it looks like one of our js file triggers a Medium security alert

Whoops, I wrote that code. Always knew that I did not how to code in javascript :)

adrinjalali · 2024-02-01T11:14:17Z

@glemaitre @betatim should we merge this then? We can always revert and disable if it gets too noisy.

betatim · 2024-02-02T10:53:09Z

Merged. Let's see what happens. Having something keep an eye on our workflow configs seems useful

lesteve · 2024-02-02T13:37:48Z

Great, if you want you can now look at security reports in https://github.com/scikit-learn/scikit-learn/security/code-scanning.

MNT Add code scanning workflow

682abe3

adrinjalali reviewed Jan 30, 2024

View reviewed changes

Tweak

b79a30b

adrinjalali approved these changes Jan 31, 2024

View reviewed changes

betatim reviewed Jan 31, 2024

View reviewed changes

betatim merged commit ab476ea into scikit-learn:main Feb 2, 2024

lesteve deleted the code-scanning branch February 2, 2024 13:36

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Feb 10, 2024

MNT Add code scanning workflow (scikit-learn#28312)

bac79c2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNT Add code scanning workflow #28312

MNT Add code scanning workflow #28312

lesteve commented Jan 30, 2024 •

edited

Loading

github-advanced-security bot commented Jan 30, 2024

github-actions bot commented Jan 30, 2024 •

edited

Loading

adrinjalali left a comment

adrinjalali Jan 30, 2024

adrinjalali Jan 30, 2024

lesteve commented Jan 31, 2024 •

edited

Loading

betatim Jan 31, 2024

lesteve Jan 31, 2024 •

edited

Loading

glemaitre commented Jan 31, 2024

adrinjalali commented Feb 1, 2024

betatim commented Feb 2, 2024

lesteve commented Feb 2, 2024 •

edited

Loading

MNT Add code scanning workflow #28312

MNT Add code scanning workflow #28312

Conversation

lesteve commented Jan 30, 2024 • edited Loading

github-advanced-security bot commented Jan 30, 2024

github-actions bot commented Jan 30, 2024 • edited Loading

✔️ Linting Passed

adrinjalali left a comment

Choose a reason for hiding this comment

adrinjalali Jan 30, 2024

Choose a reason for hiding this comment

adrinjalali Jan 30, 2024

Choose a reason for hiding this comment

lesteve commented Jan 31, 2024 • edited Loading

betatim Jan 31, 2024

Choose a reason for hiding this comment

lesteve Jan 31, 2024 • edited Loading

Choose a reason for hiding this comment

glemaitre commented Jan 31, 2024

adrinjalali commented Feb 1, 2024

betatim commented Feb 2, 2024

lesteve commented Feb 2, 2024 • edited Loading

lesteve commented Jan 30, 2024 •

edited

Loading

github-actions bot commented Jan 30, 2024 •

edited

Loading

lesteve commented Jan 31, 2024 •

edited

Loading

lesteve Jan 31, 2024 •

edited

Loading

lesteve commented Feb 2, 2024 •

edited

Loading