-
-
Notifications
You must be signed in to change notification settings - Fork 11k
NumPy Security roadmap proposal #29178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
LGTM. I'd be delighted to skip the download/upload to Anaconda, the only reason for it is to generate hashes for the release products, which seems something that should be automated. |
One thing I'm still unsure of is the ephemeral runners. Building on those depends on how the build environment is setup by the host. For example, could there be malicious tools within the environment that can inject malware into the build process? How far can we control what that environment looks like, and how far does trust extend? Of course, this argument doesn't necessarily have to be restricted to those hosts, it also applies to e.g. github. |
I don't know much about it myself, but perhaps the documentation preview that's generated for (every?) PR is also worth looking at? Because, roughly speaking, it can be seen as a "free" web hosting service that anyone with a gh account can use. So it might be possible to exploit that for evil things like phishing. For example, by opening a PR when the maintainers are asleep. That way, they can get a free evil website that's hosted on output.circle-artifacts.com for a couple of hours, that can't be traced back to them (if they use an anonymous gh account). But then again, I'm not very familiar with it, so there's a good chance that I'm missing something here 🤷🏻 |
Indeed. For other readers: note that this is only true since very recently; our aarch64/arm64 wheels were built on Cirrus CI rather than GitHub Actions before so we also needed a staging bucket to bring all wheels together due to that (and one cannot push artifacts from Cirrus back to GHA in a reasonable way).
Good question. I don't think the generated html pages are a worry, because (a) they can only contain malicious content if that content is present in the PR diff which is easy to spot, and (b) very few people will actually look at those links (probably zero before checking the diffs). However, the integration of those previews (through this action) is a real security headache. For one because the action requires the most elevated permissions and because it triggers on pretty much everything. For another, because it generates pretty severe log pollution, making it really difficult to spot other irregular jobs running. A peek at https://github.com/numpy/numpy/actions will show that. One of the benefits of moving to a separate repo is very clean logs that are easy to introspect. The CircleCI redirector action is bad enough that I'd love to be able to get rid of it completely even on the main repo - but that's a separate discussion.
There could be of course, but that's similar to GitHub Actions with https://github.com/actions/runner-images/. Using a runner image generated by another provider should be quite safe compared to many other things we're doing. It's possible that the provider does it wrong of course, e.g. there are issues with how the container or VM is mounted so it has access to the bare metal machine it's running on. In that case there can be cross-talk between jobs, even with those from other organizations running on the same underlying infra. But again, not the biggest concern - I'd say trusting Cirrus or IBM is significantly safer than trusting the authors of a bunch of the random GitHub Actions we are using. That said, the other issue with self-hosted runners is that configuring them and giving them permissions to write job status back to GitHub is a little fiddly. See scipy/scipy#21740 (comment) for what I wrote on doing that on the SciPy repo. If one wants to be really secure and avoid admin settings that are potentially risky if changed/misconfigured (which can go undetected for a long time), then it's best to avoid them. This is the reason why I propose we avoid them on the new repo that we do releases from, but allow them on the main repo - if there are no secrets to compromise on the main repo, the extra risk of allowing them is quite low. Thinking about this from a "who is everybody we are trusting here", then the list for the release repo will be:
And for each of those, the group of people who have access to them (EDIT: and their transitive dependencies). For the main repo, there are many more things - I won't try to list them exhaustively here, but at a high level:
|
I am very supportive of this (and expect mpl will follow quickly/help). An additional benefit is that we will be able to rebuild wheels with build numbers 1 if we had to update a vendored library, or re-build windows wheels because of c++ library things. |
What shall we call the repo, |
|
I had
Sure, but let's give it a couple more days to give people a change to comment. I'll also ping the mailing list about this proposal. |
We haven't made too many major changes to how we deal with security for the NumPy project in a few years. The last significant changes were (a) requiring 2FA for everyone with repo admin privileges or PyPI access, (b) using hashes on all GitHub actions, and (c) adding OpenSSF scorecards and addressing some issues that that turned up. The security section of our roadmap is short, and currently says:
NumPy is quite secure - we get only a limited number of reports about potential vulnerabilities, and most of those are incorrect. We have made strides with a documented security policy, a private disclosure method, and maintaining an OpenSSF scorecard (with a high score). However, we have not changed much in how we approach supply chain security in quite a while. We aim to make improvements here, for example achieving fully reproducible builds for all the build artifacts we publish - and providing full provenance information for them.
The supply chain part is critical there, and the world has changed in a few significant ways:
tj-actions/changed-files
, used in >23,000 repositories, was compromised.numpy
is distributed in lots of commercial products so a part of our user base is affected.Some of the issues discussed in that blog post on PyTorch are also relevant for NumPy. For example, our
GITHUB_TOKEN
is still set to the default write permissions on everything, and the tokens for uploading to our nightly bucket as well as the staging bucket for releases to PyPI is accessible by anyone with write permissions on this repository. Note that it's effectively impossible to store repository secrets safely for a subset of people with commit rights, so currently our more stringent requirements for direct PyPI access are only partially effective.Here is a useful way to think about types of security threats from Supply-chain Levels for Software Artifacts:
For NumPy, we're in decent shape for "source threats" probably - commits on
main
andmaintenance/*
are quite visible so the risk of new commits being slipped in unnoticed is low, and we've never had a CVE that was actually concerning (less concerning CVEs are a pain and need mitigating, but they won't cause large-scale damage). On "build threats" I think we are not in great shape. And as one of the highest-profile Python packages and the second-most-downloaded package with compiled extensions on PyPI, we are a pretty attractive target.The aim of this issue is to serve as a tracking issue and discuss at a high level what we want to do. Sub-issues can be created for individual actionable steps.
Proposed improvements
Related to source-level access and repository permissions:
GITHUB_TOKEN
to read-only default permissions. Any actions that need it will have to explicitly, and granularly, enable write permissions for labels, pull-requests, etc.Related to building and distributing of release artifacts:
wheels.yml
runs, and maybe a security-related linter actionRelated to helping downstream distributors and end users with how they approach supply chain security:
Envisioned benefits
The primary benefit is significantly enhanced supply chain security, which is beneficial for both end users and for NumPy as a project (an event with malicious content injected would be pretty stressful for maintainers, and bad for the project's reputation).
Also importantly, it allows the main
numpy/numpy
repository to continue its relatively loose approach to development, e.g.:ppc64le
(Power architecture) support #29125 there is a proposal for IBM-hosted runners,The release process will also be easier to manage once trusted publishing is set up (no more
anaconda-client
and manually downloading/uploading wheels by the release manager).As yet another benefit: other projects in the scientific Python ecosystem tend to follow what we do, so the effort that the proposed changes will require will be well spent - it has a multiplicative effect when maintainers of other projects can learn from what we do and copy aspects of it.
I'll also note that what this doesn't do is making use of PyPI very secure. It'll be a very significant improvement to how we safeguard our release artifacts (I'd estimate >20x fewer people having access to our release secrets, plus more ability to verify the binaries). However, while PyPI is great for development, if one really cares about security (e.g., use of Python/NumPy inside large corporations or government entities), one should have a coherent strategy like building wheels from source and hosting them on private index servers, or using a commercial vendor of Python packages (there are many options, from commercial Linux distros to Anaconda, ActiveState, Chainguard - and I know more offerings are in the making). We just aim do the best we can here with limited means.
The text was updated successfully, but these errors were encountered: