Skip to content

ci: Add script for fetching past test stats from CI #7086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 12, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
ci: Add script for fetching past test stats from CI
  • Loading branch information
mafredri committed Apr 11, 2023
commit d873a70ca70824d2f5611f9bff6d09c99e46ff29
4 changes: 4 additions & 0 deletions scripts/ci-report/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,7 @@ This program generates a CI report from the `gotests.json` generated by `go test
## Limitations

We won't generate any report/stats for tests that weren't run. To find all existing tests, we could use: `go test ./... -list=. -json`, but the time it takes is probably not worth it. Usually most tests will run, even if there are errors and we're using `-failfast`.

## Misc

The script `fetch_stats_from_ci.sh` can be used to fetch historical stats from CI, e.g. for development or analysis.
124 changes: 124 additions & 0 deletions scripts/ci-report/fetch_stats_from_ci.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
#!/usr/bin/env bash
set -euo pipefail

# Usage: ./fetch_stats_from_ci.sh
#
# This script is for fetching historic test stats from GitHub Actions CI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the next step for this script? I'm wondering if we can keep it as a Github Action. The result can be downloaded as workflow artifact.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script will hopefully not be needed much, it's main purpose is to populate the historical data since #6676. The main mechanism through which the gathering of stats will happen is via webhook.

It can still be useful in the future if someone wants to do some local analysis.

#
# Requires gh with credentials.
#
# https://github.com/cli/cli/blob/trunk/pkg/cmd/run/view/view.go#L434

dir="$(dirname "$0")"/ci-stats
mkdir -p "${dir}"

pushd "${dir}" >/dev/null

# Stats step name, used for filtering log.
job_step_name="Print test stats"

if [[ ! -f list-ci.yaml.json ]]; then
gh run list -w ci.yaml -L 1000 --json conclusion,createdAt,databaseId,displayTitle,event,headBranch,headSha,name,number,startedAt,status,updatedAt,url,workflowDatabaseId,workflowName \
>list-ci.yaml.json || {
rm -f list-ci.yaml.json
exit 1
}
fi

runs="$(
jq -r '.[] | select(.status == "completed") | select(.conclusion == "success" or .conclusion == "failure") | [.databaseId, .event, .displayTitle, .headBranch, .headSha] | @tsv' \
<list-ci.yaml.json
)"

while read -r run; do
mapfile -d $'\t' -t parts <<<"${run}"
parts[-1]="${parts[-1]%$'\n'}"

database_id="${parts[0]}"
event="${parts[1]}"
display_title="${parts[2]}"
head_branch="${parts[3]}"
head_sha="${parts[4]}"

run_jobs_file=run-"${database_id}"-"${event}"-jobs.json
if [[ ! -f "${run_jobs_file}" ]]; then
echo "Fetching jobs for run: ${display_title} (${database_id}, ${event}, ${head_branch})"
gh run view "${database_id}" --json jobs >"${run_jobs_file}" || {
rm -f "${run_jobs_file}"
exit 1
}
fi

jobs="$(
jq -r '.jobs[] | select(.name | startswith("test-go")) | select(.status == "completed") | select(.conclusion == "success" or .conclusion == "failure") | [.databaseId, .startedAt, .completedAt, .name, .url] | @tsv' \
<"${run_jobs_file}"
)"

while read -r job; do
mapfile -d $'\t' -t parts <<<"${job}"
parts[-1]="${parts[-1]%$'\n'}"

job_database_id="${parts[0]}"
job_started_at="${parts[1]}"
job_completed_at="${parts[2]}"
job_name="${parts[3]}"
job_url="${parts[4]}"

job_log=run-"${database_id}"-job-"${job_database_id}"-"${job_name}".log
if [[ ! -f "${job_log}" ]]; then
echo "Fetching log for: ${job_name} (${job_database_id}, ${job_url})"
# Example log (partial).
# test-go (ubuntu-latest) Print test stats 2023-04-11T03:02:18.4063489Z ##[group]Run # Artifacts are not available after rerunning a job,
# test-go (ubuntu-latest) Print test stats 2023-04-11T03:02:18.4063872Z # Artifacts are not available after rerunning a job,
# test-go (ubuntu-latest) Print test stats 2023-04-11T03:02:18.4064188Z # so we need to print the test stats to the log.
# test-go (ubuntu-latest) Print test stats 2023-04-11T03:02:18.4064642Z go run ./scripts/ci-report/main.go gotests.json | tee gotests_stats.json
# test-go (ubuntu-latest) Print test stats 2023-04-11T03:02:18.4110112Z shell: /usr/bin/bash -e {0}
# test-go (ubuntu-latest) Print test stats 2023-04-11T03:02:18.4110364Z ##[endgroup]
# test-go (ubuntu-latest) Print test stats 2023-04-11T03:02:19.3440469Z {
# test-go (ubuntu-latest) Print test stats 2023-04-11T03:02:19.3441078Z "packages": [
# test-go (ubuntu-latest) Print test stats 2023-04-11T03:02:19.3441448Z {
# test-go (ubuntu-latest) Print test stats 2023-04-11T03:02:19.3442927Z "name": "agent",
# test-go (ubuntu-latest) Print test stats 2023-04-11T03:02:19.3443311Z "time": 17.538
# test-go (ubuntu-latest) Print test stats 2023-04-11T03:02:19.3444048Z },
# ...
gh run view --job "${job_database_id}" --log >"${job_log}" || {
# Sometimes gh failes to extract ZIP, etc. :'(
rm -f "${job_log}"
echo "Failed to fetch log for: ${job_name} (${job_database_id}, ${job_url}), skipping..."
continue
}
log_lines="$(wc -l "${job_log}" | awk '{print $1}')"
if [[ ${log_lines} -lt 2 ]]; then
# Sometimes gh returns nothing and gives no error :'(
rm -f "${job_log}"
echo "Log is empty for: ${job_name} (${job_database_id}, ${job_url}), skipping..."
continue
fi
fi

job_stats="$(
grep "${job_name}.*${job_step_name}" "${job_log}" \
| sed -E 's/.*[0-9-]{10}T[0-9:]{8}\.[0-9]*Z //' \
| grep -E "^[{}\ ].*"
)"

if ! jq -e . >/dev/null 2>&1 <<<"${job_stats}"; then
# Sometimes actions logs are partial when fetched via CLI :'(
echo "Failed to parse stats for: ${job_name} (${job_database_id}, ${job_url}), skipping..."
continue
fi

job_stats_file=run-"${database_id}"-job-"${job_database_id}"-"${job_name}"-stats.json
jq \
--arg event "${event}" \
--arg branch "${head_branch}" \
--arg sha "${head_sha}" \
--arg started_at "${job_started_at}" \
--arg completed_at "${job_completed_at}" \
--arg display_title "${display_title}" \
--arg url "${job_url}" \
'{event: $event, branch: $branch, sha: $sha, started_at: $started_at, completed_at: $completed_at, display_title: $display_title, url: $url, stats: .}' \
<<<"${job_stats}" \
>"${job_stats_file}"
done <<<"${jobs}"
done <<<"${runs}"