-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Support alternate solution for bazel based C++ builds #19447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi Thanks for your report, and for your good suggestions.
The C++ extractor detects invocations of the C++ compiler during the build, and uses that information to determine which files need to be extracted (including e.g. compile-time auto-generated files). This is the reason why the build needs to be full, as the resulting CodeQL database will otherwise be incomplete. An alternative to doing build-based CodeQL database creation is to use build mode CC @github/codeql-c-extractor . |
To add to @hvitved answer and summarize what is scattered in several places, there's 3 pieces of information from the build that we currently use:
Build mode none is indeed not supported at this time for C/C++. In principle we could support distributed builds - as long as everything is getting build - with the help of Build caches we do not support at this time, and depending on how this is arranged in Bazel, this might need close integration with Bazel, as (a) we would somehow need to see that something is pulled from the cache, and as (b) we would need to know how it would have been compiled/linked if it was not pulled from the cache. Something like ccache might be easier support in this respect, given that it's effectively a number of symlinks that we could detect. |
Thanks for the replies!
I think omitting generated files (if that was the only downside) would be totally acceptable for our use case. I did a quick test of this and got:
off hand i don't see if there's a user facing way to allow this "pre-release feature" Thanks for the added detail @jketema! The linker invocations is an interesting one. The other index formats I mentioned definitely are more geared towards querying information about the final artifacts, more than how they were built. In the case of bazel it would be trivial to generate this information the same way compile commands are generated. Realistically all of this information is query-able without even building anything, which is how the compile commands generation works as well. (unless binary artifacts are also needed, which it doesn't sound like it from your message above) The gist of this is we run I don't think in order to have more close integration with bazel codeql would have to do anything indepth with the caches. I mentioned that piece because if there was some compilation unit specific artifact that we could produce instead, we could just build all of those and aggregate them, which naturally would pull from the cache if nothing has changed. |
Currently the codeql documentation contains this example for use with bazel based projects:
The gist of this is that you must start from scratch and build the whole project on the local machine without any caching. For many projects that moved to bazel moving away from this model was one of the core motivations. For large builds that rely on caching / remote execution, it may no longer be feasible to run the entire build on 1 machine, even if it's only on a scheduled cadence. Moreover the user might not even host the infrastructure to do this anymore, as all of their actual developer and CI builds go through remote execution.
It would be great if there was another way to communicate the necessary information to codeql, in a way that could work alongside bazel's model.
I can't tell from the documentation what the codeql CLI is actually pulling from the build, and I imagine it's quite in depth, but I assume there are other models that could work for providing the same information for certain types of builds.
For example for C++ some natural alternatives (from the user perspective) would be to:
Each of these options would work better in the bazel model, since indexes could be produced remotely, and cached, and compile_commands.json should already be supported for developer workflows.
related:
The text was updated successfully, but these errors were encountered: