Support non-standard compilation processes #12

mvanotti · 2020-01-24T02:57:16Z

In a recent talk with @adityasharad , he mentioned that CodeQL would try to understand when a compiler is being invoked. Some projects use goma to speed up the build process, reusing previously built artifacts.

CodeQL seems to ignore all the artifacts that are obtained via goma.

p0 · 2020-01-27T13:12:10Z

To process C++ code, we indeed need to know how to compile it. The easiest way of setting that up is codeql's default, but that indeed relies on observing compilations locally, and distributed build systems like goma or bazel will not work with that approach without disabling the "distributed" aspect.

One possible approach is to leverage compile-commands.json files, as generated by many build systems. The information in them is sufficient to drive the CodeQL tooling, but obtaining such a file is a non-standard build-system-specific and sometimes project-specific process. One possibility would be to add support in CodeQL for creating a database based on such compiler settings files. [It's worth noting that this would not be equivalent to tracing a full local build, since CodeQL takes advantage of information from linker invocations too, and those are not represented in compilation command databases.]

mvanotti · 2020-01-27T18:10:13Z

Maybe it would be good to have a way of determining what is needed for CodeQL to work properly, and then each build system could figure out how to export that data somehow. That way, having extractors for different build systems depends on the community.

Manouchehri · 2020-03-18T19:05:56Z

Using bear + #9 would be a decent solution in my opinion. =)

p0 · 2020-03-18T19:57:54Z

I would expect CodeQL's built-in support to be able to handle any situation where bear would also work. The problem with goma or bazel is that the compilations end up being done on a server process or different machine, and are invisibile to whatever local monitoring you are trying to do of the build process.

haxmeadroom · 2021-02-16T18:07:39Z

Has CodeQL been tested to work with bazel by disabling the distributed aspect, or is this hypothetical? What changes were made to bazel to accomplish this?

To process C++ code, we indeed need to know how to compile it. The easiest way of setting that up is codeql's default, but that indeed relies on observing compilations locally, and distributed build systems like goma or bazel will not work with that approach without disabling the "distributed" aspect.

adityasharad · 2021-02-16T18:43:45Z

For Bazel, one approach that we have used successfully is the following. It can be passed as the build command to codeql database create or use as a run shell step within a GitHub Actions workflow for CodeQL code scanning.

bazel shutdown; bazel build --spawn_strategy=local --nouse_action_cache //path/to/build/targets/...

shutdown stops all locally-running Bazel servers
--spawn_strategy=local disables the distributed aspect
--nouse_action_cache disables the action cache, increasing the likelihood that all your code is recompiled during the build

More involved integration into Bazel's dependency graph is possible but not likely for the majority of use cases. Please try this and let us know if it helps.

haxmeadroom · 2021-02-16T22:57:56Z

The above approach did not work for me. I've also tried many combinations with --spawn_strategy=local, --nouse_action_cache, --batch, --action_env=LD_PRELOAD=...lib64trace.so, etc... I have fuzzgoat that builds outside of bazel and gets 12 results (in the csv file output). If I build inside bazel, it runs the 142 evaluations but returns no rows in the csv file. I also always get a warning from bazel that LD_PRELOAD is being ignore. My impression is LD_PRELOAD is required to work, right? Any ideas?

adityasharad · 2021-03-02T19:18:25Z

@haxmeadroom could you share more about the project you're building (link if it's open source) and the build commands you're using with and without Bazel?

pestophagous · 2021-06-07T15:47:49Z

I discovered CodeQL this weekend (while digging around in GitHub repo settings looking for other things).

I enabled it to see what would happen, but beyond that I have put essentially zero extra time into reconfiguring my build or trying to get CodeQL to work better on my repo.

Relevant to this bug/enhancement ticket:

My build toolchain is qmake (for now), and it seems that any cc/cpp files built in my qmake build are not being scanned.

My build also uses a submodule pointing to a different project that uses CMake, and when my build first compiles that project (which is a dependency of my app), then those files built with CMake do appear to get scanned. I know this because there are 3 warnings from the submodule codebase.

I'm actually quite pleased to see the scan including the submodule code. (After all, any vulnerabilities in the submodule will become "my" vulns after I link to that library.)

Now I just need the scan to include my code, too!

Sometime (on the weekends, for my weekend-only side project), I am willing to tweak my build script to help the scanning work.

QUESTION:

Where in the Analysis results (in GitHub web UI) or in the CI/Action log can I see a list of all the cc/cpp files that are scanned?

There must be a list (?), so I don't need to keep injecting bad code into files to see if a warning appears.

Here is the PR where I investigated that my own code does not trigger CodeQL warnings: pestophagous/heory#46

I injected the same "Multiplication result converted to larger type" issue into my code to match the issue that I saw trigger a warning in the submodule. But the scan result says "No new or fixed alerts".

adityasharad · 2021-06-07T17:12:12Z

@pestophagous you can see a brief summary of the lines of code seen by CodeQL within your Actions logs here: https://github.com/pestophagous/heory/runs/2765497253?check_suite_focus=true#step:5:257 (Analysis summary for <language>).

We're in the process of rolling out some new features that give you additional diagnostic information about the codebase that was analysed, such as the number of files (or the list of files when running with higher verbosity). Will report back when you can try those out.

pestophagous · 2021-06-07T17:24:08Z

@adityasharad This is all I see when I follow the link to the "brief summary" that you mention:

Analysis produced the following metric data:

|                  Metric                   | Value  |
+-------------------------------------------+--------+
| Total lines of C/C++ code in the database | 775466 |
##[endgroup]
##[group]Analysis summary for cpp
Counted 605060 lines of code for cpp as a baseline.
Analysis produced the following metric data:

|                  Metric                   | Value  |
+-------------------------------------------+--------+
| Total lines of C/C++ code in the database | 775466 |

That provides a "no" answer to "can I see a list of all the cc/cpp files?"

Right? I'm not mad if the answer is "no". I just want to clearly understand if it is yes or no to make sure I didn't misunderstand or follow an incorrect link.

It's great to hear you are working on additional features! I look forward to it. I contributed to this ticket only in the spirit of "giving back" and providing more real-life test cases for the team. I'm not complaining! (How could I, this is all provided free of cost to me!)

Thanks for your reply and interest.

mvanotti · 2021-06-07T18:02:18Z

@pestophagous , I created issue #13 to track what you are asking for. There's a query that will give you the list of files that are in the database.

context: github/codeql-cli-binaries#12

mvanotti · 2021-10-04T22:15:49Z

I have seen that there's a new "Indirect Tracing Mode" for building CodeQL databases. Is this the recommended way to build databases for other build environments (for example, GOMA or RBE)?

Would it be possible to just have something that parses compile_commands.json and emmits the env variables that are needed for codeql cli ?

mvanotti · 2021-10-04T23:16:34Z

Ah, my bad, the Indirect Build Tracing still tries to figure out what the extractors are. But it seems like it should detect gomacc, right?

adityasharad · 2021-10-04T23:20:57Z

👋 For goma (as I understand it) the main requirement is that you disable the distributed aspect of the build. If the build is constrained to the local machine, then either a direct command line passed to codeql database create or a sequence of build steps wrapped by CodeQL's indirect build tracing will work. Neither of those features is designed to force the build to run locally, so you must configure your build to do so.

compile_commands.json support is something we see the need for and are discussing at the moment, with the same caveats that p0 described earlier in this issue. Will keep you updated if this makes it onto our roadmap.

mvanotti · 2021-10-04T23:34:09Z

Hi @adityasharad !

I thought codeqlcli only needed to lookup the compiler invocations of the commands. My understanding is that when compiling with goma, you just use gomacc to build, instead of your regular compiler. That's why I thought it would be somewhat doable to trace.

AIUI, RBE (Remote Build Execution) uses a similar thing, but uses a different prefix (no gomacc).

So I am wondering what would we need to get those as recognized by the codeql cli extractors.

dmivankov · 2022-05-15T21:00:12Z

Another option for bazel with strict action_env is to add following to bazelrc

# CodeQL build mode
# some vars are defined in https://github.com/github/codeql-action/blob/d7ad71d8034d228d5c8076dc7f058905e272a3fd/src/tracer-config.ts

# CodeQL needs to trace compiler via LD_PRELOAD + some other vars
build:codeql --action_env LD_PRELOAD --action_env ODASA_TRACER_CONFIGURATION --action_env SEMMLE_EXECP --action_env SEMMLE_JAVA_TOOL_OPTIONS --action_env SEMMLE_PRELOAD_libtrace --action_env SEMMLE_PRELOAD_libtrace32 --action_env SEMMLE_PRELOAD_libtrace64 --action_env SEMMLE_COPY_EXECUTABLES_ROOT

# CodeQL needs to compile everything locally and without cache
build:codeql --noremote_accept_cached --remote_upload_local_results=false --spawn_strategy=local

# Pass along CODEQL_* env vars
build:codeql --action_env CODEQL_EXEC_ARGS_OFFSET --action_env CODEQL_EXTRACTOR_JAVA_LOG_DIR --action_env CODEQL_EXTRACTOR_JAVA_RAM --action_env CODEQL_EXTRACTOR_JAVA_ROOT --action_env CODEQL_EXTRACTOR_JAVA_SOURCE_ARCHIVE_DIR --action_env CODEQL_EXTRACTOR_JAVA_THREADS --action_env CODEQL_EXTRACTOR_JAVA_TRAP_DIR --action_env CODEQL_EXTRACTOR_JAVA_WIP_DATABASE --action_env CODEQL_JAVA_HOME --action_env CODEQL_PARENT_ID --action_env CODEQL_PLATFORM --action_env CODEQL_PLATFORM_DLL_EXTENSION --action_env CODEQL_RAM --action_env CODEQL_SCRATCH_DIR --action_env CODEQL_THREADS --action_env CODEQL_DIST --action_env CODEQL_TRACER_LOG

and then use bazel build --config codeql as build command

update: above works for java 11 code under bazel 4, java 17 with bazel 5 but not for java 11 with bazel 5

j2kun · 2024-03-31T21:11:56Z

Is anyone aware of a method like @dmivankov's java approach that would suffice for bazel+cpp? I have a compile_commands.json generator available.

adityasharad · 2024-04-01T16:34:00Z

@j2kun could you first try adapting the Bazel example at https://docs.github.com/en/code-security/codeql-cli/getting-started-with-the-codeql-cli/preparing-your-code-for-codeql-analysis#specifying-build-commands (scroll down from that link and look for "Project built using Bazel"). That approach has worked well for us in many scenarios internally and externally; if it doesn't work for your use case please file a separate issue with the details for us. Thanks!

keith · 2025-05-01T15:16:21Z

I created a separate issue for bazel builds here github/codeql#19447 I don't think the solutions discussed here work for our cases

adityasharad added the enhancement New feature or request label Jul 8, 2020

adityasharad added the CLI label Jul 31, 2020

b-maldoca added a commit to google/maldoca that referenced this issue Jul 6, 2021

adding special local build options to codeql build

99c8a75

context: github/codeql-cli-binaries#12

keith mentioned this issue May 1, 2025

Support alternate solution for bazel based C++ builds github/codeql#19447

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support non-standard compilation processes #12

Support non-standard compilation processes #12

mvanotti commented Jan 24, 2020

p0 commented Jan 27, 2020

mvanotti commented Jan 27, 2020

Manouchehri commented Mar 18, 2020

p0 commented Mar 18, 2020

haxmeadroom commented Feb 16, 2021

adityasharad commented Feb 16, 2021

haxmeadroom commented Feb 16, 2021 •

edited

Loading

adityasharad commented Mar 2, 2021

pestophagous commented Jun 7, 2021

adityasharad commented Jun 7, 2021 •

edited

Loading

pestophagous commented Jun 7, 2021

mvanotti commented Jun 7, 2021

mvanotti commented Oct 4, 2021

mvanotti commented Oct 4, 2021

adityasharad commented Oct 4, 2021

mvanotti commented Oct 4, 2021

dmivankov commented May 15, 2022 •

edited

Loading

j2kun commented Mar 31, 2024

adityasharad commented Apr 1, 2024

keith commented May 1, 2025

Support non-standard compilation processes #12

Support non-standard compilation processes #12

Comments

mvanotti commented Jan 24, 2020

p0 commented Jan 27, 2020

mvanotti commented Jan 27, 2020

Manouchehri commented Mar 18, 2020

p0 commented Mar 18, 2020

haxmeadroom commented Feb 16, 2021

adityasharad commented Feb 16, 2021

haxmeadroom commented Feb 16, 2021 • edited Loading

adityasharad commented Mar 2, 2021

pestophagous commented Jun 7, 2021

adityasharad commented Jun 7, 2021 • edited Loading

pestophagous commented Jun 7, 2021

mvanotti commented Jun 7, 2021

mvanotti commented Oct 4, 2021

mvanotti commented Oct 4, 2021

adityasharad commented Oct 4, 2021

mvanotti commented Oct 4, 2021

dmivankov commented May 15, 2022 • edited Loading

j2kun commented Mar 31, 2024

adityasharad commented Apr 1, 2024

keith commented May 1, 2025

haxmeadroom commented Feb 16, 2021 •

edited

Loading

adityasharad commented Jun 7, 2021 •

edited

Loading

dmivankov commented May 15, 2022 •

edited

Loading