Skip to content

Add indirect build tracing docs #6667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 16, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions docs/codeql/codeql-cli/creating-codeql-databases.rst
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,123 @@ commands that you can specify for compiled languages.
This command runs a custom script that contains all of the commands required
to build the project.

Using indirect build tracing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If the CodeQL CLI autobuilders for compiled languages do not work with your CI workflow and you cannot wrap invocations of build commands with ``codeql database trace-command``, you can use indirect build tracing to create a CodeQL database. To use indirect build tracing, your CI system must be able to set custom environment variables for each build action.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋🏻 AFAICT we don't currently explain how to "wrap invocations of build commands with codeql database trace-command".

The section above uses codeql database create with --command='<build command>. Are these equivalent? (I notice that the trace-command help describes it as a plumbing command).

I wonder whether it should be more like:

Suggested change
If the CodeQL CLI autobuilders for compiled languages do not work with your CI workflow and you cannot wrap invocations of build commands with ``codeql database trace-command``, you can use indirect build tracing to create a CodeQL database. To use indirect build tracing, your CI system must be able to set custom environment variables for each build action.
If the CodeQL CLI autobuilders for compiled languages do not work with your CI workflow and you cannot specify build commands with using the ``--command`` option, you can use indirect build tracing to create a CodeQL database. To use indirect build tracing, your CI system must be able to set custom environment variables for each build action.

Alternatively, we may be missing a section on how to use codeql database trace-command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, codeql database init followed by codeql database trace-command <command> followed by codeql database finalize should be equivalent to codeql database create --comand=<command>. The latter is the recommended way to create the DB if you have a single command. The former is the recommended way to create a DB if the build requires multiple commands (and you can't wrap them in a script to make them into a single command) and you can add codeql database trace-command in front of each one. The new indirect tracing option addresses the case where:

  1. You have multiple build commands.
  2. You cannot wrap them with codeql database trace-command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the extra information. It sounds as if we could do with a short overview giving the options for tracing the build for compiled languages. Possibly this would be better as part of a follow up PR, but I'll leave @ethanpalm to make the call on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the considerations we're trying to balance is to provide indirect tracing as an option for people who need it without directing people toward it unintentionally. This came up in naming and avoiding calling indirect build tracing an advanced option. It feels to me like indirect build tracing would be better introduced as a troubleshooting option rather than one of several options for tracing the build of compiled languages in general.

Perhaps a line at the beginning of Creating databases for compiled languages, after For compiled languages, CodeQL needs to invoke the required build system to generate a database, therefore the build method must be available to the CLI. explaining to see below about indirect build tracing if it is relevant to the specific use case. I think this could help direct people who need to use indirect build tracing to the procedure but won't cause people to think they should use indirect build tracing when they don't need to.

Copy link
Contributor Author

@ethanpalm ethanpalm Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a bit more thinking, I am going to open a separate issue for how we introduce this information because I think there are a few different approaches we can take.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @felicitymay's suggestion about providing a short guide on the options for compiled languages. There are many possibilities, but to me the high level options are:

  • Do you use a well-known build system recognised by the CodeQL autobuilders? Use codeql database create (without a --command argument) to autobuild the code.
  • Do you know the build command line? Use codeql database create ... --command "<build command>"
    • A variation of this is if you have multiple build command lines, in which case you would use trace-command multiple times. I don't think we need to mention that just yet.
  • If neither of the above are suitable, for example if you are using preconfigured build steps from your CI system that do not expose the build command, then use indirect build tracing. Examples of such build steps are the VSBuild and MSBuild tasks in Azure DevOps.

Notably, indirect tracing is not a viable troubleshooting option. Aside from autobuild failing, there's no way to try out a build without it if you don't know the build command.


To create a CodeQL database with indirect build tracing, run the following command from the checkout root of your project:

::

codeql database init ... --begin-tracing <database>

You must specify:

- ``<database>``: a path to the new database to be created. This directory will
be created when you execute the command---you cannot specify an existing
directory.
- ``--begin-tracing``: creates scripts that can be used to set up an environment in which build commands will be traced.

You may specify other options for the ``codeql database init`` command as normal.

.. pull-quote:: Note

If the build runs on Windows, you must set either ``--trace-process-level <number>`` or ``--trace-process-name <parent process name>`` so that the option points to a parent CI process that will observe all build steps for the code being analyzed.


The ``codeql database init`` command will output a message::

Created skeleton <database>. This in-progress database is ready to be populated by an extractor.
In order to initialise tracing, some environment variables need to be set in the shell your build will run in.
A number of scripts to do this have been created in <database>/temp/tracingEnvironment.
Please run one of these scripts before invoking your build command.

Based on your operating system, we recommend you run: ...

The ``codeql database init`` command creates ``<database>/temp/tracingEnvironment`` with files that contain environment variables and values that will enable CodeQL to trace a sequence of build steps. These files are named ``start-tracing.{json,sh,bat,ps1}``. Use one of these files with your CI system's mechanism for setting environment variables for future steps. You can:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good explanation.


* Read the JSON file, process it, and print out environment variables in the format expected by your CI system. For example, Azure DevOps expects ``echo "##vso[task.setvariable variable=NAME]VALUE"``.
* Or, if your CI system persists the environment, source the appropriate ``start-tracing`` script to set the CodeQL variables in the shell environment of the CI system.

Build your code; optionally, unset the environment variables using an ``end-tracing.{json,sh,bat,ps1}`` script from the directory where the ``start-tracing`` scripts are stored; and then run the command ``codeql database finalize <database>``.

Once you have created a CodeQL database using indirect build tracing, you can work with it like any other CodeQL database. For example, analyze the database, and upload the results to GitHub if you use code scanning.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: link to the docs on analyzing and uploading?


Example of creating a CodeQL database using indirect build tracing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The following example shows how you could use indirect build tracing in an Azure DevOps pipeline to create a CodeQL database::

steps:
# Download the CodeQL CLI and query packs...
# Check out the repository ...

# Run any pre-build tasks, for example, restore NuGet dependencies...

# Initialize the CodeQL database.
# In this example, the CodeQL CLI has been downloaded and placed on the PATH.
- task: CmdLine@1
displayName: Initialize CodeQL database
inputs:
# Assumes the source code is checked out to the current working directory.
# Creates a database at `<current working directory>/db`.
# Running on Windows, so specifies a trace process level.
script: "codeql database init --language csharp --trace-process-name Agent.Worker.exe --source-root . --begin-tracing db"

# Read the generated environment variables and values,
# and set them so they are available for subsequent commands
# in the build pipeline. This is done in PowerShell in this example.
- task: PowerShell@1
displayName: Set CodeQL environment variables
inputs:
targetType: inline
script: >
$json = Get-Content $(System.DefaultWorkingDirectory)/db/temp/tracingEnvironment/start-tracing.json | ConvertFrom-Json
$json.PSObject.Properties | ForEach-Object {
$template = "##vso[task.setvariable variable="
$template += $_.Name
$template += "]"
$template += $_.Value
echo "$template"
}

# Execute the pre-defined build step. Note the `msbuildArgs` variable.
- task: VSBuild@1
inputs:
solution: '**/*.sln'
# Disable MSBuild shared compilation for C# builds.
msbuildArgs: /p:OutDir=$(Build.ArtifactStagingDirectory) /p:UseSharedCompilation=false
platform: Any CPU
configuration: Release
# Execute a clean build, in order to remove any existing build artifacts prior to the build.
clean: True
displayName: Visual Studio Build

# Read and set the generated environment variables to end build tracing. This is done in PowerShell in this example.
- task: PowerShell@1
displayName: Clear CodeQL environment variables
inputs:
targetType: inline
script: >
$json = Get-Content $(System.DefaultWorkingDirectory)/db/temp/tracingEnvironment/end-tracing.json | ConvertFrom-Json
$json.PSObject.Properties | ForEach-Object {
$template = "##vso[task.setvariable variable="
$template += $_.Name
$template += "]"
$template += $_.Value
echo "$template"
}

- task: CmdLine@2
displayName: Finalize CodeQL database
inputs:
script: 'codeql database finalize db'

# Other tasks go here, for example:
# `codeql database analyze`
# then `codeql github upload-results` ...

Obtaining databases from LGTM.com
---------------------------------

Expand Down